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(57) ABSTRACT 

On completion of execution of a current block of 
instructions, a block completion process searches for poten- 
tial successor blocks, using block descriptors and egress data 
structures. For each potential successor block, the process 
compares a set of entry conditions associated with the block 
with the exit conditions of the current block and, if a match 
is found, selects the potential successor block as the current 
block and executes it. A consistency check is performed, to 
compare the block identity of the successor block with an 
expected block identity. Block-following code is selectively 
planted into translated blocks, to call a successor block 
directly, by-passing the block completion process. The 
block-following code is optimised, in that it contains tests 
for entry conditions only if the results of those tests are not 
known at the time the block- following code is planted. 

10 Claims, 2 Drawing Sheets 
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INSTRUCTION EXECUTION MECHANISM BRIEF DESCRIPTION OF THE DRAWINGS 



BACKGROUND TO THE INVENTION 

This invention relates to a mechanism for executing 5 
instructions in a computer system. 

The invention is particularly concerned with a computer 
system in which source code instructions are translated into 
target code instructions for execution on a particular pro- 
cessor. This may be required, for example, where one 1Q 
processor is being used to emulate another, in which case the 
instructions for the processor being emulated must be trans- 
lated into instructions for the emulating processor. 

One approach, referred to as interpretation, is to create a 
software model of the instruction set of the processor being 15 
emulated. This model operates by reading each target 
instruction, decoding it, and selecting one of a number of 
sequences that perform the same function as the instruction 
being emulated. This fetch/decode/execute sequence is 
repeated for each source code instruction in turn. 2Q 

A more efficient approach is to translate a block of source 
code instructions, rather than a single instruction. That is, the 
source code is divided into blocks, and each source code 
block is translated into a block of target code instructions, 
functionally equivalent to the source code block. Typically, 2 s 
a block has a single entry point and one or more exit points. 
The entry point is the target of a source code jump, while the 
(or each) exit is a source code jump. 

Translating blocks is potentially much more efficient, 
since it provides opportunities for eliminating redundant 30 
instructions within the target code block, and other optimi- 
sations. Known optimising compiler techniques may be 
employed for this purpose. To increase efficiency further, the 
target code blocks may be held in main memory and/or a 
cache store, so that they are available for re-use if the same 35 
section of code is executed again, without the need to 
translate the block. 

The present invention is concerned with the problem of 
sequencing the execution of such blocks, to ensure that each 
block is followed by an appropriate successor block. The 40 
object of the present invention is to provide an improved 
method for sequencing execution of code blocks. 

SUMMARY OF THE INVENTION 

According to the invention, a method of executing 45 
instructions in a computer system, comprises: 

(a) storing a plurality of blocks of instructions, each block 
having a specified set of entry conditions associated 
with it; 

(b) on completion of execution of a current block of 50 
instructions, searching for potential successor blocks; 
and 

(c) for each potential successor block, comparing the set 
of entry conditions associated with that block with the 
exit conditions of the current block of instructions and, 55 
if a match is found, selecting the potential successor 
block as the current block and executing it. 

It will be seen that the invention permits the use of 
multiple blocks for different entry conditions. The use of 
such multiple blocks is advantageous, in that each block can 60 
be tailored to its specific entry conditions, thereby improv- 
fc ing its efficiency of execution. A preferred form of the 
invention also allows block-following code to be planted in 
the blocks, to optimise the sequencing between the blocks. 

One embodiment of the invention will now be described 65 
by way of example with reference to the accompanying 
drawings. 



FIG. 1 is a block diagram showing an instruction execu- 
tion mechanism embodying the invention. 

FIG. 2 is a flow chart of a block completion process. 

DESCRIPTION OF AN EMBODIMENT OF THE 
INVENTION 

FIG. 1 shows a host computer system 10, including a disk 
memory 11, and a main memory 12. 

The disk memory holds a number of source code blocks 
13. As will be described, these blocks are copied (paged) 
into the main memory on demand. Therefore, at any given 
time, the main memory holds a number of source code 
blocks 14, corresponding to a subset of the blocks 13 in the 
disk memory. 

The source code blocks consist of sequences of instruc- 
tions in a source instruction set, which in this example is 
assumed to be the ICL VME instruction set. Each of the 
source code blocks has a set of entry conditions associated 
with it, these entry conditions consisting of: 

The program counter (PC) value at entry to the block. 

The accumulator size (ACS) value at entry to the block. 

It should be noted that a plurality of source code blocks 
may exist with the same PC, each having a different ACS 
value. 

An interpreter program 15 is provided for executing the 
source code blocks 14 in the cache. The interpreter takes 
each source code instruction in turn, decodes it, and selects 
one of a number of target code sequences in the instruction 
set of the host computer that perform the same function as 
the source code instruction. Details of the interpreter are not 
relevant to the present invention, and so it will not be 
described in any further detail. 

A translator program 16 is also provided. This can be 
called, as will be described, to translate a selected source 
code block 14 into a target code block 17, in the instruction 
set of the host computer. The translation takes account of the 
current entry conditions for the block, so that each translated 
block is specific to a particular set of entry conditions. 
Details of the translator are not relevant to the present 
invention, and so it will not be described in any further 
detail. 

Each source code block may have a block descriptor 18 
associated with it. Each block descriptor contains the fol- 
lowing information: 

BID An identity number, uniquely identifying this block 
. descriptor. 

BACS The accumulator size (ACS) on entry into the block. 
BPC The program counter (PC) value on entry into the 
block. 

BNIB The address of the translated target code (if any) 

corresponding to the source code block. 
USAGE A count of the number of times the block has been 

executed. 

When a new block descriptor is required, if there is no 
space in memory for creating a new block descriptor, an 
existing block descriptor may be reused. 

Every time a block descriptor is created or reused, it is 
allocated a new block identifier BID equal to a global block 
identifier value GLOBALJJID, and GLOBAL_BID is 
then incremented. This ensures that each block has a unique 
BID value. (When GLOBAL_BID eventually reaches its 
maximum value, it will be reset to zero, and in this case all 
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existing block descriptors and target code blocks must be AND 
flushed from the cache. However, this will happen only 

rarely). current_block.BNlB!-NULL 

The way in which the code blocks are threaded together ~, c , .. t , . . , , , . . 

is specified by data structures 19, referred to herein as 5 , The first Lne of this test chectetha the BNIB value m the 

egresses, linked to the block descriptors 18. Each block d ^ T ?^ pomle , d ,'° ^ } h \ BI f va ' ue n m M t , 1 j J 6 6 S; ess .» n ° l 

i . , „ . 1- i j . i- i_ n u "' Hie second line checks that the BNIB value in the 

descriptor may have one or more egresses linked to it. Each .,11 . - » 

egress contains information relating to a potential successor CUrrent b ' 0ck d * SCr ' Pt ° r " ™ l ^ U ^ ° f J** «>»*- 

block for the current block, as follows: tlons are l ™' * e ft f ocess con,mues at Ste P 204; « herwise 

it goes to Step 206. 

ddp tt,^ ™„r-» m /dp\ „,i„ 0 .i_ (Step 204) The process checks the BOPT flag in the 

BrC liie program counter (PC) value on entry into the 1 . j * j . • < ■ .... o „ 

successor block selected egress to determine whether or not block-following 

BID The identity number of the block descriptor associated ^p 3 ^"" f^f ^ "^""lil?* ^ 

with the successor block. If D0PT 18 Fals *' ,he Pr° cess continues at Step 205; other- 

BLK The address of the block descriptor associated with the is Wl f^, lt ^ne^iif' 6 ' 5 , . u , ■ c • 

successor block ^ p ' process plants block-following code into 

BOPT A flag, indicating whether or not block-following the current translated block, to link the current block to its 

code has been planted into the target code for this egress successor. Th B code will be performed next time this block 

r ° is executed. 

¥ i u c i i» i • (Step 206) The process then performs the action: 

In general, whenever execution of a code block is com- 20 v r ' r r 

pleted (either execution of source code by the interpreter, or Current_biock-current_biock.egress.BLK. 
execution of a translated target code) a block completion 

process 20 is executed, to select the appropriate successor This designates the successor block (i.e. the block indi- 

block. The exception to this is the case where a translated cated by the BLK value in the egress) as the new current 

block contains block-following code. In that case, the block- 25 block. 

following code may cause a jump to the required successor (Step 207) The process then checks the BNIB value in the 

block, directly from the translated code, without executing current block descriptor. If it is null, this indicates that the 

the block completion process, as will be described. block has not been translated, and the process continues at 

The block completion process 20 will now be described Step 208. Otherwise, it continues at Step 210. 

with reference to FIG. 2. 30 (Step 208) The process checks whether the USAGE count 

(Step 201) The process first accesses the block descriptor in the current block descriptor is greater than a predcter- 

18 associated with the current code block (i.e. the code block mined threshold value, indicating that the (as yet 

that has just completed), and checks whether this descriptor untranslated) block has been executed more than a certain 

has any egresses 19 linked to it, number of times. If the USAGE count is greater than the 

(Step 202) If there are one or more egresses linked to the 35 threshold, the process continues at Step 209; otherwise, it 

current block, the process selects each egress in turn, and proceeds to Step 211. 

performs a test to check whether this is a valid egress for the (Step 209) The process calls the translator 16, to translate 

current conditions. This test can be expressed as follows: the block. The translated code is stored in the main memory, 

and the BNIB value in the current block descriptor is set 

40 equal to the address of the translated code. 

The translated code has two entry points: a first entry 

£ - ____^^ JES Point which is used when entry is made from the Mock 

ACS « current_btoc_egress.BLK->BACS and completion process, and a second entry point for use when 

Current_block.cgress.BID — current block-egress. BLK- > BID entry is made from block -following Code. 

45 (Step 210) A jump is then made to the first entry point of 

The first line of this test checks that the BLK value in the the translated code, and this code is executed, 

egress is not null, i.e. the egress points to a valid successor ( Ste P 211 > If the USAGE coum is not 8 realer lhan the 

block. The second line checks that the current program threshold, the process calls the interpreter 15, to execute the 

counter value (PC) is equal to the BPC value in the egress. bI ° ck * ^ f USA p E ">™t is incremented. 

The third line checks that the current accumulator size 50 < Ste P 212 ) If u ™ s found at Ste P 201 that the current 

(ACS) is equal to the BACS value in the descriptor pointed block does not have any egresses, or if no valid egresses 

to by the BLK value in the egress. The second and third lines ™« fouDd al Step 202, the process searches through the 

therefore check that the entry conditions for the successor block descriptors 14, looking for a successor block whose 

block are the same as the conditions on exit from the current entr y conditions match the exit conditions (PC, ACS) of the 

block. The fourth line is a consistency check, which con- 55 currem bloc k* 

firms that the BID value in the egress equals the BID value < Ste P 213 > If the rea - uired successor block is not found in 

in the descriptor pointed to by the BLK value in the egress. tne currenl sel of block descriptors, then the source code 

This consistency check is necessary to ensure that the block 1314 15 ex P lored to find the required successor block, i.e. to 

pointed to by the egress is the same block as when the egress find lnc sourcc code bIock whose entr y conditions match the 

was created, i.e. the block descriptor has not been re-used. 60 exit conditions (PC, ACS) of the current block, and a new 

The egress is considered valid only if all four of these block desenptor 18 is created to describe it. In the new block 

conditions are true descriptor, BACS and BPC are set equal to the values of 

(Step 203) If a valid egress is found in Step 202, the ACS and PC on exit from the currenl block ' BNIB * ^ { t0 

process performs a test to check whether both the current NUEL ' and USAGE is set to 0. 

block and the successor block have been translated: 65 . ( Sle P 214 ) A new pointing to the successor block, 

is then created and linked to the current block descriptor. The 

cuneat_biock.egress.BLK->BNiB!»NULL BOPT flag in the new egress is set to FALSE. 
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(Step 215) The process then designates the successor block without having to call the block completion process, 

block as the new current block. Operation continues at Step Otherwise, the block completion process 20 will be called. 

208 as described above. As mentioned above, each translated target code block 17 

In summary, it can be seen that on completion of a block, has two entry points: a first entry point which is used when 

the block completion process searches for potential succes- 5 entry is made from the block completion process 20, and a 

sor blocks. For each potential successor, the entry conditions second entry point for use when entry is made from the 

(PC, ACS) associated with that block are compared with the block- following code. 

conditions on exit from the current block. If they match, the The second entry point performs the following consis- 

successor block becomes the new current block, and a jump tency check: 

is made to it. 10 i F parent_block. B ID !-current_block. egress. BLK->B ID 

The block-following code planted at Step 205 above will THEN 

now be described. The block-following code planted n ... 1 1 .• 

, , 11 1 ] fno , . - Return to block completion process 

depends on whether the values of PC and ACS on exit from r 1 

the block are predictable at the time the block-following END IF 

code is planted, or can be determined only at run time. There 15 Current_block«current__block.egress.BLK 

are four cases, as follows. where pare nt_b lock is the block from which this code was 

generated. 

Case 1: both PC and ACS are predictable. In this case, the In this consistency check, parent_b lock. BID is planted as 

block-following code includes an unconditional jump to a literal value, and hence does not require access to the 

the translated code for the successor block, as follows: 20 egress data structure. 

JUMP current„block.egress.BLK->BNIB. consistency check confirms that the block identifier 

of the block being entered is equal to the expected block 

Case 2: PC not predictable, ACS predictable. In this case, the identifier. If so, this block is designated as the new current 

block-following code includes the following: block > a " d execution of the translated code continues. 

IF current__block.egrcss.PC==PC 25 Otherwise, the block completion process 20 is returned to. 

TH£N It can be seen that planting the block-following code 

JUMP current_block.egre SS .BLK->BNIB imp™™ V format, because it by-passes the standard 

END IF block completion process 20 in cases where it is known that 

a valid translated successor block exists, with the correct 

j ■ . . . 1 . _, , r m entry conditions. Performance is also improved because the 

In other words, a jump is made to the translated code for JU , c „ . , t . , , . , 

iU , f f iL 11^ 1 * tL block-following code contains literal values, and so does not 

the successor block, only 11 the PC value on entry to the . " . , . . , 

, . , * , , J require to access the egress data structures, 

successor block is equal to the current PC value. n 0 

^ . .™ ... , - _ , . , SOME POSSIBLE MODIFICATIONS 
Case 3: PC predictable, ACS not predictable. In this case, the 

block-following code includes the following: 35 It will be appreciated that many modifications may be 

IF current„block.egress.BLK->BACS=«ACS made to the system described above without departing from 

'THEN the scope of the present invention. For example, entry 

JUMP current_block,egress.BLK->BNIB conditions other than (or additional to) PC and ACS may be 

Ejsro ip tested for, 

40 What is claimed is: 

In other words, a jump is made to the translated code for A method of executing instructions in a computer 

the successor block, only if the ACS value on entry to the system, the method comprising: 

successor block is equal to the current ACS value. (a) storing a plurality of blocks of instructions, each block 

45 having a specified set of entry conditions associated 

Case 4: both PC and ACS are unpredictable. In this case, the with it; 

block-following code includes the following: (b) executing a current block of instructions to produce a 

IF current_block.egress.PC==PC AND set of run-time exit conditions; 

current_block.egress.BLK->BACS==ACS (c) on completion of execution of said current block of 

THEN 50 instructions, searching for potential successor blocks; 

JUMP current_block.egress.BLK->BNIB and 

END IF (d) for each potential successor block, comparing the set 

of entry conditions associated with that block with the 

In other words, a jump is made to the translated code for run-time exit conditions of the current block of instruc- 

the successor block, only if both the PC and ACS value on 55 dons and, if a match is found, selecting the potential 

entry to the successor block are equal to the current PC and successor block as the current block and executing it. 

ACS values. 2. A method according to claim 1, further including 

Thus, the block- following code is optimised, in that it selectively planting executable block-following code into 
contains tests for entry conditions only if the results of those the current block of instructions on completion of execution 
tests are unknown or unpredictable at the time the block- 60 of said current block of instructions, said executable block- 
following code is planted. following code including instructions to call a successor 

The values current_block. egress. PC, currents block directly when said current block of instructions is next 

block.egress.BLK and current„block. egress. BLK->BACS executed. 

are planted as literal values, and hence do not require access 3. A method according to claim 2 wherein the planted 

to the egress data structure. 65 block-following code is optimised, in that it contains tests 

If all the tests in the block-following code are successful, for entry conditions only if the results of those tests are not 

a jump will be made to the translated code for the successor known at the time the block-following code is planted. 
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4. A method according to claim 1 including 

(a) creating a number of block descriptors, each of which 
holds information relating to a particular block; and 

(b) creating a number of egress data structures, each 
associated with one of the block descriptors, each 
egress pointing to a potential successor block; and 

(c) accessing the egress data structure for the potential 
successor block to obtain the entry conditions for that 
block. 

5. A method of executing instructions in a computer 
system, the method comprising: 

(a) storing a plurality of blocks of instructions, each block 
having a specified set of entry conditions associated 
with it; and 

(b) planting executable block- folio wing code into at least 
some of said blocks, and* 

(c) executing a current block of instructions to produce a 
set of run-time exit conditions for said current block; 

(d) said executable block-following code including tests 
to determine whether the run-time exit conditions for 
the current block match the entry conditions for a 
specified successor block; 

(e) wherein if said tests are satisfied the executable 
block-following code causes a jump directly to the 
specified successor block. 

6. A method according to claim 5 wherein the block- 
following code is optimised, in that it contains tests for exit 
conditions only if those exit conditions are unpredictable at 
the time the block -following code is planted. 

7. A method according to claim 5 wherein, if said tests are 
not successful, the block-following code calls a block- 
completion process to search for a successor block whose 
entry conditions match the exit conditions of the current 
block. 



54,373 Bl 

8 

8. A method according to claim 5 wherein, on entry from 
the block-following code of a predecessor block, the suc- 
cessor block performs a consistency check to confirm that it 
is the correct successor block and, if this consistency check 
5 is unsuccessful, the successor block calls a block- 
completion process to search for a successor block whose 
entry conditions match the exit conditions of the predecessor 
block. 

3Q 9. A method according to claim 5 wherein the block- 
following code is planted into a block only if that block and 
its successor block have both been translated into the native 
instruction code of the computer system. 

10, A method of executing instructions in a computer 

15 system, the method comprising: 

(a) storing a plurality of blocks of instructions; 

(b) storing a plurality of data structures holding informa- 
tion on entry conditions and potential successor blocks 
for each of the blocks of instructions; 

20 

(c) executing a current block of instructions to produce a 
set of run-time exit conditions; 

(d) on completion of execution of said current block of 
instructions, searching the data structures to locate a 

25 successor block whose entry conditions match the 
run-time exit conditions of the current block of instruc- 
tions; and 

(e) if a successor block whose entry conditions match the 
run-time exit conditions of.the current block of instruc- 
tions is found, selecting that successor block as the 
current block and executing it. 

* * * * * 
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[57] ABSTRACT 

A method for automatically instrumenting a computer pro- 
gram for dynamic debugging. Such a computer program 
comprising source code written in a programming language 
for executing instructions on the computer. The source code 
is provided as a sequence of statements in a storage device 
to the computer. Each of the statements are separated into 
tokens representing either an operator or at least one oper- 
and. A parse tree is built according to a set of rules using the 
set of tokens. The parse tree is instrumented to create an 
instrumented parse tree for indicating that an error condition 
occurred in the computer program during execution. Object 
code is generated from the instrumented parse tree and 
stored in a secondary storage device for later execution 
using an error-checking engine that indicates error condi- 
tions present in the computer program. 
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DEBUGGING 
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Reference to Microfiche Appendix and Printed 
Appendices 

A microfiche appendix is part of the specification which 
includes 17 microfiche and 1585 frames. 

In addition, two printed documents are part of the speci- 
fication and are included as 28 -page Appendix A and 
17-pagc Appendix B. Two computer source code listings are 
also part of this specification and are included as 14-page 
Appendix C and 3-page Appendix D. 

A portion of the disclosure of this patent document 
contains material to which a claim of copyright is made. The 
copyright owner has no objection to the facsimile reproduc- 
tion by anyone of the patent document or patent disclosure, 
as it appears in the Patent and Trademark Office patent file 
or records, but reserves all other copyright rights whatso- 
ever. 

FIELD OF THE INVENTION 

The present invention relates in general to automatic 
instrumentation methods for computer programs and in 
particular to automatic instrumentation methods for debug- 
ging of a computer program using a compiler parse tree. 

BACKGROUND OF THE INVENTION 

Typically, computer programs are developed in a cycle of 
writing, compiling, executing, debugging and rewriting 
computer code until a satisfactory program is attained. Two 
types of debugging can be performed: static debugging 
whereby the source code comprising the computer program 
is analyzed and corrected for errors prior to program 
execution, and dynamic debugging whereby runtime errors 
are detected by observing the behavior of the program 
during execution. 

A computer program can be dynamically debugged by 
employing a separate program or device to observe the 
behavior of the target computer program by monitoring 
memory locations. A computer program can also be dynami- 
cally debugged internally by introducing debug statements 
or routines into the program and observing the results during 
program execution. These statements can be manually intro- 
duced into the source code during the writing stage of 
program development. They can also be automatically intro- 
duced by a separate program at some stage in the develop- 
ment cycle prior to execution. The automatic introduction of 
debug statements or routines is known as instrumentation. 

Instrumentation can be used to perform tasks useful to 
debugging and analyzing a computer program. These 
include: analyzing code coverage to determine how often 
each program statement is executed and how long it takes to 
run; analyzing variables to determine what values are taken 
on and how often different parts of memory are accessed; 
analyzing program characteristics, such as memory usage 
and which functions are called using which parameters; and 
analyzing the correct use of program code by checking 
various assertions that ensure that what the program is doing 
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actually makes sense. In addition to the tasks listed above, 
instrumentation can be used to automatically generate test 
cases for dynamically testing the program. Test case data for 
program inputs can be generated automatically by the instru- 

5 mentation which then links to a test harness program to 
repeatedly execute the program with different inputs. 

Instrumentation can be automatically built into a com- 
puter program in a number of ways. First, instrumentation 
can be introduced before compilation by manipulating the 

10 source code and introducing instrumentation routines at 
appropriate locations. A problem with this approach is that 
it is slow and inefficient for large or highly complex pro- 
grams. 

Instrumentation can also be automatically introduced 
35 after compilation but before link editing by analyzing the 
relocatable object code output by the compiler. A problem 
with this approach is that the broader context of the target 
program is lost to the earlier stages of compilation. 
Consequently, the introduction of instrumentation must be 
20 limited to an analysis of memory locations and address 
pointers. 

Finally, instrumentation can be automatically introduced 
after link editing by manipulating the executable program. 
This approach suffers from the same problems as with 

25 relocatable object code. 

A further problem with these approaches is that the 
automatic introduction of instrumentation constitutes an 
extra stage in the program development cycle. 

30 Consequently, there is a need for a method of automatically 
instrumenting a computer program for dynamic debugging 
as an integral part of the program development cycle and 
without introducing an extra stage. 

SUMMARY OF THE INVENTION 

35 

The present invention overcomes the above problems and 
pertains to a method for automatically instrumenting a 
computer program for dynamic debugging. More 
specifically, such a computer program constitutes source 

40 code written in a programming language for executing 
instructions on a computer. The programming language has 
a grammar comprising operations having an operator and at 
least one operand and a set of rules for relating each such 
operator to its respective operand(s). The method consists of 

45 the steps of providing the source code as a sequence of 
statements in a storage device to the computer. Each of the 
statements are separated into tokens representing either an 
operator or at least one operand. 

A parse tree is built according to the set of rules using the 

50 set of tokens whereby the parse tree is a directed acyclic 
graph and constitutes a plurality of nodes connected by paths 
organized into a hierarchy of parent nodes representing 
operators connected to children nodes representing operands 
of the operators. The parse tree contains embedded error 

55 detection statements for communicating information to a 
runtime error-checking facility which can test for and indi- 
cate error conditions as they occur. The parse tree is instru- 
mented to create an instrumented parse tree for indicating 
that an error condition occurred in the computer program 

60 during execution. Object code is generated from the instru- 
mented parse tree and stored in a secondary storage device 
for later execution using an error-checking engine that 
indicates error conditions present in the computer program. 

65 BRIEF DESCRIPTION OF THE DRAWINGS 

The above and further advantages of this invention may 
be better understood by reference to the following detailed 
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description taken in conjunction with the accompanying 
drawings in which: 

FIG. 1 is a block diagram of a process for creating and 
debugging a computer program; 

FIG. 2 is a schematic diagram of a computer system for 5 
performing a method for automatically instrumenting a 
computer program for dynamic debugging according to the 
present invention; 

FIG. 3 is a software block and schematic diagram for a 
method for automatically instrumenting a computer program 
for dynamic debugging; 

FIG. 4 is a flow chart of a preferred embodiment of the 
method according to the present invention; 

FIGS. 5A, SB and 5C are a flow chart of a routine for 15 
determining the instrumentation to augment a parse tree; 

FIG. 6 is a source code listing of a computer program 
containing an uninitialized read of a memory variable; 

FIG. 7 is a diagram illustrating a parse tree representation 
of the source code listing in FIG. 6; 20 

FIGS. 8A and 8B are a flow chart for a routine for 
detecting an uninitialized read of a program variable error 
condition; 

FIG. 9 is a diagram illustrating an instrumented parse tree 
representation of the source code listing shown in FIG. 6; 25 

FIG. 10 is a source code listing of a computer program 
containing a write operation to an invalid memory address; 

FIG. 11 is a diagram illustrating a parse tree representa- 
tion of the source code listing in FIG. 10; 3Q 

FIG. 12 is a flow chart for a routine for detecting a write 
operation to an invalid memory address for a complex 
memory variable error condition; 

FIG. 13 is a diagram illustrating an instrumented parse 
tree representation of the source code listing shown in FIG. 35 
10; 

FIG. 14 is a source code listing of a computer program 
containing a dynamic memory manipulation error using a 
pointer memory variable; 

FIG. 15 is a diagram illustrating a parse tree representa- 40 
tion of the source code listing in FIG. 14; 

FIGS. 16 A and 16B are a flow chart of a routine for 
detecting a dynamic memory manipulation error using a 
pointer memory variable error condition; 

FIGS. 17A and 17B are a flow chart of a routine for 45 
performing a dynamic memory manipulation check; 

FIG. 18 is a diagram illustrating an instrumented parse 
tree representation of the source code listing shown in FIG. 
14; 

FIG. 19 is a source code listing of a program segment 50 
containing an inappropriate use of a pointer memory vari- 
able; 

FIG. 20 is a diagram illustrating a parse tree representa- 
tion of the source code listing shown in FIG, 19; 55 

FIGS. 21 A and 2 IB are a flow chart of a routine for 
detecting an inappropriate use of a pointer memory variable 
error condition; 

FIG. 22 is a diagram illustrating an instrumented parse 
tree representation of the source code listing shown in FIG. 60 
19; 

FIG. 23 is a source code listing of a computer function 
containing a memory leak error; 

FIG. 24 is a diagram illustrating a parse tree representa- 
tion of the source code listing shown in FIG. 23; 65 

FIG. 25 is a flow chart of a routine for detecting a memory 
leak error condition; 
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FIG. 26 is a diagram illustrating an instrumented parse 
tree representation of the source code listing shown in FIG. 
23; 

FIG. 27A is a computer function to be instrumented with 
an interface and FIG. 27B is the interface routine; 

FIG. 28 is a diagram illustrating a parse tree representa- 
tion of the program segment shown in FIG. 27 A; 

FIGS. 29A and 29B are a flow chart of a routine for 
inserting an interface; 

FIG. 30 is a diagram illustrating an instrumented parse 
tree representation of the program segment shown in FIG, 
27A; 

FIG. 31 is a source code listing of a computer function to 
be instrumented for automatic test case generation; 

FIG. 32 is a diagram illustrating a parse tree representa- 
tion of the source code listing shown in FIG. 31; and 

FIG. 33 is a diagram illustrating an instrumented parse 
tree representation of the source code listing shown in FIG. 
31. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENT 

A block diagram of a process for creating and debugging 
a computer program is shown in FIG, 1. A source program 
10 comprising source code written in a programming lan- 
guage for executing instructions on a computer system is 
translated into an executable program 13 through a compi- 
lation process 11. The source program is translated into an 
equivalent program that can be run on a target computer 
system. The compilation process can involve compiling, 
interpreting or a similar translation of the source program. 

The compilation process also reports to the user the 
presence of static errors 10 in the source program due to 
errors in lexicography, syntax and semantics. For instance, 
a string of characters can fail to form a token recognized by 
the programming language (lexicographic error). Or, a set of 
tokens may violate a structure rule that the parser is unable 
to construct into a branch of a parse tree syntactic error). Or, 
a proper syntactic structure can be semantically incorrect 
because it fails to have any meaning with respect to the 
operation involved (semantic error). 

After the static errors are resolved, the program is further 
evaluated during the execution process 14 which detects 
dynamic errors 15 based on the runtime attributes of pro- 
gram operation. Dynamic errors are difficult to detect since 
they stem from logical or conceptual errors in the drafting of 
the source program rather than the concrete static errors 
resulting from an improper expression of the program. To 
detect dynamic errors, the program must be instrumented 
with debug routines during some phase of the compilation 
process whereby messages indicating the presence of a 
dynamic error are generated for evaluation by. the user. 

A schematic diagram of a computer system for perform- 
ing a method for automatically instrumenting a computer 
program for dynamic debugging according to the present 
invention is shown in FIG. 2. A main memory 23 contains 
a compiler 24 in the form of a computer program for 
carrying out the steps of compiling and a code instrumenter 
28 for automatically instrumenting a computer program. A 
microprocessor 22 runs the compiler using the source pro- 
gram file 20, which contains the source program 10, and the 
programming language definition file 21, which contains a 
grammar comprising operations and a set of rules. The 
microprocessor runs the compiler and creates an executable 
program file 25, which contains the instrumented executable 
program 13 in the form of object code. 
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During the execution of the compiler 24, an error may An instrumentation component 37 reads the stored parse 

arise due to some problem with the source program. Error tree and augments the parse tree with instrumentation for use 

messages are presented to the user on a display 26 and in dynamic debugging. The details of the instrumentation 

collected and stored in an error file 27. component are described in more detail below. It generates 

Source code listings for a computer program for use in 5 an instrumented parse tree 38. In a preferred embodiment of 

one embodiment of the present invention are included in the the present invention, eight categories of instrumentation are 

microfiche appendix. The source code is written in C Ian- used - These include detecting a read operation to an unini- 

guage. A description of the C language is detailed in B. W. tialized memory variable, detecting a read or write operation 

Kemighan&D.M. Ritchie, The C Programming Language, to an invalid memory address for a complex memory 

Prentice Hall (2d Ed, 1988), the disclosure of which is ™ variable, detecting a dynamic memory manipulation error 

hereby incorporated by reference. * using a pointer memory variable, detecting an inappropriate 

Hie computer program of the microfiche appendix is use of a P° mter mem °7 variabie > 1 c | etectin & a memoryleak 

preferably run on a Sun Microsystems SPARCstation 20 e ™ r : ^nd detecting a function call argument error These 

workstation running the Unix operating system. The source also includ r e a definabIe ^mentation routine known 

code listings are compiled using the instructions contained 15 as an mterface and an automatlc test case generation routine, 

in Appendix D, the disclosure of which is hereby incorpo- For eacn category, an analysis is performed to determine 

rated by reference. The resulting program is executed. which check or operation is appropriate and instrumentation 

Preferably, the workstation is equipped with 64 megabytes is embedded into the parse tree. Some categories require 

of random access memory and 4 gigabytes of secondary instrumentation to be introduced in several locations in the 

storage space. 20 parse tree. The result is an instrumented parse tree, which is 

A software block and schematic diagram for a method for stored m main memorv - 

automatically instrumenting a computer program for A code generation component 39 reads the instrumented 

dynamic debugging according to the present invention is parse tree and generates an object code file 40 with the 

shown in FIG. 3. One embodiment of the present invention instrumentation incorporated. This component is sometimes 

is described in "Compiler Intermediate Code Insertion for 25 divided into an intermediate code generator, a code 

Automatic Debugging and Test Case Generation/' which is optimizer, and a code generator. The intermediate code 

attached as Appendix A, the subject matter of which is generator transforms the instrumented parse tree into an 

hereby incorporated by reference as if set forth in full. The intermediate representation representing a program for an 

method uses a computer program consisting of five main abstract machine. This representation is useful for comput- 

components coordinated by a main control component 30. ing expressions, handling flow of control constructs and 

The source code component 31 reads a source code file 32 procedure calls. The code optimizer attempts to improve the 

stored in a secondary storage device and provides it to the performance of the intermediate code by decreasing the 

computer program. running time of the executable program. Finally, the code 

Alexical analysis component 33 separates the sequence of 35 generator creates relocatable machine code or assembly 

statements making up the source code in to tokens 34 by code 10 be 0Ut P ut 35 the ob J ect code filc 40 Memory 

scanning the characters comprising each statement and locations are selected for each variable used and the inter* 

grouping the characters into tokens. Each token has a mediate instructions are translated into a sequence of 

collective meaning in the context of the grammar defining machine instructions that perform the same task. These are 

the programming language that the source program is writ- 40 combined and output as object code, 

ten in. In most programming languages, key words, Throughout the operation of each component shown in 

operators, identifiers, constants, literal strings, and punctua- FIG. 3, reference is made to a language file 41 containing the 

tion symbols (such as parentheses, commas and semicolons) definition of grammar rules for the programming language, 

are treated as tokens. The tokens 34 are stored in the main Similarly, errors in the source program that are detected are 

memory. 45 output to the user through error messages 43 and error file 

A parsing and semantic analysis component 35 groups the ^2. 

tokens into grammatical phrases that are used to represent A preferred embodiment of the compiler 24 is shown in 

the instructions to be performed by the source program. FIG. 4. A file containing source code comprising the com- 

These grammatical phrases are represented by a parse tree puter program to be instrumented is provided to the compiler 

36, which is stored in main memory. 50 (block 51). The source code is written in a programming 

The parse tree describes the syntactic structure of the language for executing instructions on a computer, 

source program. A description of the data structures used for The programming language is defined by a grammar 

representing a parse tree in one embodiment of the present comprising operations having an operator (to identify the 

invention is attached as Appendix C and the subject matter operation) and at least one operand (upon which the opera- 

of which is hereby incorporated by reference. It is a hier- 55 tion is performed). In addition, the grammar includes a set 

archical representation of the instructions making up the of rules for relating each of the operations to their respective 

program structured as a directed acyclic graph comprising a operands. Preferably, the grammar is a context-free grammar 

hierarchy of parent and children nodes interconnected by having four components: a set of tokens, known as terminal 

paths with a root node representing the program's entry symbols; a set of nonterminals; a set of productions, where 

point. The blueprint for creating a parse tree is provided by 60 each production consists of a nonterminal, an arrow, and a 

the rules of the programming language grammar. Each path sequence of tokens and/or nonterminals; and a designation 

connecting a parent node to a child node represents a of one of the nonterminals as a start symbol. The productions 

relationship between an operator and its operands. A single define the set of operations comprising that grammar. Each 

instruction can comprise several operations and each such production is structured with the nonterminal on its left side, 

operation becomes a node in the parse tree. Operations can 65 followed by an arrow, followed by a sequence of tokens 

be defined recursively whereby an operation constitutes an and/or nonterminals on its right side. A description of a 

operand for another operation. context-free grammar is detailed in H. R. Lewis & C. H. 
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Papadimitriou, Elements of the Theory of Computation, addition, other information is communicated to the error- 
Prentice-Hall (1981), the disclosure of which is hereby checking engine through the instrumented code. This con- 
incorporated by reference. sists of declarations of variables and pointer addresses and 

Each separate source code statement is separated into their sizes, assignments of pointers, function entry and exit 

tokens (block 52), each token representing a terminal sym- 5 point indicators, and memory allocation indicators, 

bol in the grammar. A token can be either an operator or an The first category of dynamic memory error is the use of 

operand. In addition, source code comments and white space . uninitialized memory variables. This means that a memory 

(comprising blanks, tabs and new line characters) are variable is declared, but is not yet assigned a value before it 

removed during this step. is used by some other statement in the program. Referring to 

The set of tokens is used to build a parse tree (block 53) 10 FIG. 6, a source code listing of a computer program con- 
that represents the structure of the program operations. The taining an uninitialized read of a memory variable is shown, 
parse tree is structured with certain properties. This includes On line 3, an integer variable "i" is defined. On line 4, the 
having a root node labeled by a start symbol, each node variable "i" is read. Since variable "i" is uninitialized, a 
being labeled by a token or a null value, and each interior dynamic error occurs at runtime, 

node being labeled with a nonterminal. For each nontermi- 35 Referring to FIG. 7, a diagram illustrating a parse tree 

nal node, the children of that node correspond to the representation of the source code listing in FIG. 6 is shown, 

right-hand side of the production rule for the operation The function declaration (lines 1, 2 and 6) is represented by 

represented by the parent node. In addition to parse tree function declaration node 80. The declaration of variables 

representations, other intermediate representations for orga- " a " and "i" (line 3) is represented by variable declarations 

nizing tokens are possible. The same approach presented 20 node 81. The assignment operation (line 4) is represented by 

herein applies to other intermediate representations as well. assignment node 82 which is followed by return node 83 

Once completed, the parse tree is instrumented (block 54) (corresponding to line 5). The assignment operation (line 4) 

to communicate runtime information to the error-checking has two operands, variables "a" and "i", represented respec- 

engine to facilitate automatic detection of dynamic errors in lively by variable nodes 84 and 85. To detect the use of the 

the source program. This step requires a two-phase 25 uninitialized memory variable "i" (line 4), the parse tree 

approach. During the first phase, the source code is analyzed shown in FIG. 7 must be instrumented with debugging 

using a flow analysis procedure to determine the type of functionality so that the attempted assignment statement 

instrumentation that is appropriate. During the second phase, using variable "i" can be automatically detected by the error 

the parse tree is augmented with additional nodes compris- checking engine when the program is executed, 

ing the operations required to communicate runtime condi- The overall criteria for inserting such an error check is as 

tions to the error-checking engine which include appropriate follows. If there is a memory variable used in a program 

checks for runtime or dynamic errors or programmatic expression that is not known to have been assigned a value 

anomalies to the error-checking engine in the form of debug previously, an error check is inserted into the parse tree to 

output. 35 check that variable during execution. In addition, an error 

The instrumented parse tree is used to generate code for check is inserted to let the error-checking engine know that 

the target program which not only functions as was origi- the variable at that particular address in memory is initially 

nally intended, but also contains calls to instrumentation uninitialized at the start of execution, 

procedures which provide automatic error detection of Referring to FIGS. 8A and 8B, a flow chart for a routine 

dynamic program errors as well as an ability to automati- 4Q f or detecting an uninitialized read of a memory variable 

cally generate test cases. This is accomplished by passing error condition is shown. First, a memory address is 

runtime information to the error-checking engine which is retrieved from a program stack frame (block 90) which 

linked with the target program when the program executes. represents memory locations of local memory variables. A 

The instrumented parse tree is used to generate object flow analysis is performed on the source code to identify any 

code (block 55), which is stored in a secondary storage 45 read operation to the memory address for which it cannot be 

device. The steps of separating source code into tokens statically determined that the variable has been previously 

(block 52), building a parse tree (block 53), and generating initialized (block 91). If a read operation is found (block 92), 

object code (block 55) are described in A. V. Aho et al., instrumentation nodes are inserted into the parse tree in two 

Compilers, Principles, Techniques and Tools, Addison- locations. First, nodes are inserted after the parse tree node 

Wesley (1986), the disclosure of which is hereby incorpo- 50 corresponding to the stack frame containing the memory 

rated by reference. address for the program variable to be checked (block 93). 

Referring to FIGS. 5 A, 5B and 5C, a routine for instru- These nodes are for setting an internal indication to the 

menting a parse tree according to the present invention is error-checking engine that the memory variable is uninitial- 

shown. A step-wise procedure is followed to insert each of ized. Second, instrumentation nodes are inserted into the 

the seven categories of instrumentation into the parse tree. 55 P arse tree before the read operation (block 94) to indicate to 

Thus, nodes are inserted for detecting a read operation to an the error-checking engine that the memory variable being 

uninitialized memory variable (block 61), detecting a write read by the read operation is either initialized or uninitial- 

operation to an invalid memory address for a complex ized at that point in program execution. These nodes deter- 

memory variable (block 62), detecting a dynamic memory mine the status of the memory variable by referring to the 

manipulation error using a pointer memory variable (block 60 indication set by the instrumentation nodes for the stack 

65), detecting an inappropriate use of a pointer memory frame. 

variable (block 67), detecting a memory leak error (block Next, a flow analysis is performed on the source code to 

69), inserting a user-defined instrumentation routine identify a write operation to the memory address for the 

(interface) (block 73), and inserting an automatic test case program variable being checked, since any write operation 

generation routine (block 75). 65 w in caU se the memory variable in question to be initialized 

The seven categories of instrumentation perform checks (block 95). If a write operation is found (block 96), inslru- 

or augment the functionality of the original source code. In mentation nodes are inserted into the parse tree after the 
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nodes corresponding to the write operation for setting an operation to that array, the error-checking engine must check 
indication used by the error-checking engine to indicate that if the index into the array is valid. For complex memory 
the memory variable in question is initialized (block 97). variables, a similar declaration must be made to the error- 
Referring to FIG. 9, a diagram illustrating an instru- checking engine; however, the engine must perform a test 
mented parse tree representation of the source code listing 5 for whether the memory address being written to falls 
shown in FIG. 6 is shown. The instrumentation nodes for the outside of a valid memory address range defined by the 
stack frame are inserted as uninitialized node 100. The dimension operand used to declare the memory block size 
instrumentation nodes for the read operation are inserted as for the complex memory variable. 

check variable read node 101. The instrumentation nodes for Referring to FIG. 12, a flow chart for a routine for 

the write operation are inserted as copy initialize node 102. io detecting a write operation to an invalid memory address for 

Uninitialized node 100 indicates to the error-checking a complex memory variable error condition is shown. This 

engine that the variables "a" and "i" are uninitialized. Check involves a more general error check than for an array and is 

variable read node 101 indicates to the error-checking therefore presented initially. 

engine that an actual check for the uninitialized variable "i" ^irsi, a flow analysis is performed on the source code to 

should be performed. Finally, copy initialize node 102 15 ideati f y a declaration operation for a complex memory 

indicates to the error-checking engine that the variable "a" variable (block 120). Such a variable comprises a plurality 

is being assigned a value which initializes it by copying a of elements, each of which can be a constant value, a simple 

value from some other memory location. memory variable or a complex memory variable. A decla- 

The second category of dynamic error is a write operation ration operation for a complex memory variable comprises 

to an invalid memory address far a complex memory 20 two components: an identifier operand for identifying the 

variable. This is also known as memory corruption which variable and a dimension operand for identifying a memory 

occurs when a program writes to a location in memory that block size. If a declaration operation is found (block 121), 

is not valid. For instance, this can happen as a result of instrumentation nodes are inserted after the parse tree node 

writing off of the end of an array. Similarly, it can happen as corresponding to the declaration operation (block 122). 

result of writing to a location in memory that falls outside of 25 These nodes are for storing the dimension operand for use by 

the range of memory locations allocated to a complex the error-checking engine during execution, 

memory variable, such as a structure. Next> a flow aDa i ys i s is performed on the source code to 

A complex memory variable comprises a plurality of identify a write operation using the complex memory vari- 

elements, each of which can be a constant value, a simple 3o able being checked (block 123). If a write operation is found 

memory variable or a complex memory variable. An array (block 124), instrumentation nodes are inserted into the 

comprises a plurality of identical elements, each of which parse tree before the nodes corresponding to the write 

can be constant value, a simple memory variable, or a operation (block 125). During execution, the error-checking 

complex memory variable. engine can indicate that the write operation is writing to an 

Referring to FIG. 10, a source code listing of a computer 35 invalid memory address falling outside of the memory 

program containing a write operation to an invalid memory address range defined by the stored dimension operand, 

address is shown. An array "A" is defined comprising ten To check for a write to an invalid array memory location, 

integer elements (line 3). Each of these ten elements are an additional step is required to those shown in FIG. 12. It 

initialized to 0 (line 5) using a loop beginning at an index comprises augmenting the last step with inserting instru- 

value of 1 (lines 4 and 6). The valid indices for the array "A" 4Q mentation nodes into the parse tree to further indicate to the 

are 0 through 9. However, the loop begins with an index "i" error-checking engine that a write operation is being per- 

equaling 1 that is incremented during each successive itera- formed on an array element falling outside of the range of 

tion until the index "i" equals 10 (line 4). Thus, in the tenth valid array indices. 

iteration, the program attempts to set array element A[10] to Referring to FIG. 13, a diagram illustrating an instru- 

0. This is invalid since array "A" does not have an index 45 men ted parse tree representation of the source code listing 

value of 10 and therefore an overwrite dynamic error occurs. sn0WI1 m p IGt iq is shown. The instrumentation nodes for 

Referring to FIG. 11, a diagram illustrating a parse tree the declaration operation are inserted as declare node 130. 
representation of the source code listing in FIG. 10 is shown. The instrumentation nodes for the write operation are 
The function declaration (lines 1, 2 and 8) is represented by inserted as check array access node 131. Declare node 130 
function declaration node 110. The declaration of index 50 indicates to the errorTchecking engine that the array "A" is 
variable "i" and array "A" which has 10 elements (line 3) is declared and contains ten elements. Since the present pro- 
represented by variable declarations node 111. The loop gram is written in C language and array indices begin with 
operation (lines 4 and 6) is represented by loop node 112 0, the range of valid indices for array "A" are from 0 through 
which is followed by return node 113 (corresponding to line 9. The check array access node 131 indicates to the error- 
7). The assignment operation (line 5) has two operands, an 55 checking engine that a write operation is being performed on 
array element "A[i] M and an integer constant 0, represented an element of array "A" and that the value of the index, here 
respectively by nodes 115 and 116. index "i", should be checked to determine whether it falls 

To detect an array operation that is attempting to access an within the range of valid array indices, 

invalid memory location, the parse tree shown in FIG. 11 The third category of dynamic error is a dynamic memory 

must be instrumented with debugging functionality so that 60 manipulation error using a pointer memory variable. This 

the error can be automatically detected by the error-checking occurs when memory pointers no longer reflect the actual 

engine when the program is executed. Here, the array layout of memory due to problems with dynamic memory 

operation is an assignment to element A[10] on line 5 of the manipulation. This often involves a "dangling pointer" 

program. Element A[10] is out of bounds. which is a memory pointer which points to a block of 

The overall criteria for inserting this type of error check 65 memory that has since been "freed," that is, deallocated, 

is as follows. For arrays, the array variable and its size must While the memory pointer still points to the address of the 

be declared to the error-checking engine. For each write same freed memory block, the address is no longer a 
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representative of the dynamic state associated with the during execution. Each allocation record contains the fol- 

original pointer assignment. Six types of errors can occur, lowing information: block size, starting memory address for 

such as reading from or writing to a dangling pointer, the block, addresses of memory pointers that point to the 

passing a dangling pointer as an argument of a function, memory block, a list of memory pointers that are contained 
returning a dangling pointer from a function, freeing the 5 within the memory block, and state information regarding 

same memory block multiple times, freeing stack memory the memory block. 

(local variables), and attempting to free a memory block Next, a flow analysis is performed on the source code to 
using a pointer that does not point to the beginning of a valid identify an assignment operation to the selected pointer 
memory block. memory variable (block 166). If an assignment operation is 
Referring to FIG. 14, a source code listing of a computer 30 found (block 167), instrumentation nodes are inserted into 
program containing a dynamic memory manipulation error the parse tree after the nodes corresponding to the assign- 
using a pointer memory variable is shown. A pointer ment operation (block 168). These nodes are for indicating 
memory variable "ptr" is defined (line 3). Next, a to the error-checking engine that the pointer memory vari- 
10-character memory block is allocated and its pointer able may contain a different and possibly invalid memory 
assigned to pointer memory variable "ptr" (line 4). The 1S address. 

pointer memory variable "ptr" is incremented (line 5) and an The previous steps having been accomplished, the routine 

attempt is made to free the memory block that it points to can therefore perform a dynamic memory manipulation 

(line 6). However, the attempt can ultimately lead to check (block 169). Referring to FIGS. 17 A and 17B, a flow 

memory corruption since pointer memory variable "ptr no chart for a rou tine for performing a dynamic memory 
longer points to the start of the memory block that was 20 maQ i pu lation check is shown. This performs the six types of 

originally assigned to it. Therefore, a dynamic memory dynamic memory manipulation checks listed previously, 

manipulation error occurs. A fiow analysis fe performed on the source code t0 

Referring to FIG. 15, a diagram illustrating a parse tree identify a read operation or a write operation using the 

representation of the source code listing in FIG. 14 is shown. pointer memory variable being checked (block 180). If a 

The function declaration (lines 1, 2 and 8) is represented by rea d operation or a write operation is found (block 181), 

function declaration node 140. The declaration of pointer instrumentation nodes are inserted into the parse tree before 

memory variable "ptr" (line 3) is represented by variable the nodes corresponding to the read operation or the write 

declaration node 141. The allocation of the 10-character operation (block 182). During execution, the error-checking 

memory block is represented by call expression node 147, engine can indicate that the read operation or the write 

which has two operands, a function declaration and an operation is attempting to operate on a pointer memory 

argument, represented respectively by nodes 148 and 149. variable when it contains a dangling pointer, this is, a 

The result from this function call is assigned to the pointer memory address for a freed memory block, 

memory variable (line 4), which is represented by assign- Nexlj a flow analysis is perforrned on the source code to 

ment node 142. This node has two operands, a pointer identify a fu nctioil caU operation using the pointer memory 

memory variable and the function call, represented respec- variable being checked (51ock 183) , f a function call 

lively by nodes 146 and 147. The pointer increment opera- operation is found (block 184), instrumentation nodes are 

tion (line 5) is represented by node 143. The free memory inserted mto the parse tree before the nodes corresponding 

block operation (line 6) is represented by node 144, which t0 the cal] operation (b lock 185) . During execution, 

has two operands: a function call declaration and an the error-checking engine can indicate that the function call 

argument, represented respectively by nodes 150 and 151, operation is calling a function using a pointer memory 

The return operation (line 7) is represented by node 145. variable a memory address for a free d memory 

Referring to FIGS. 16A and 16B, a flow chart for a routine block, 

for detecting a dynamic memory manipulation error using a Next> a flow ana i ys i s ^ performed on the source code to 
pointer memory variable error condition is shown. This 45 identify a function call return operation using the pointer 

involves a more general error check than for the six specific mem ory variable being checked (block 186). If a function 

types of memory manipulation errors listed above and is call return operation is found (block 187), instrumentation 

therefore presented initially. nodes are i nser ted into the parse tree before the nodes 

First, a flow analysis is performed on the source code to corresponding to the function call return operation (block 
identify a declaration operation for a pointer memory vari- 50 188). During execution, the error-checking engine can indi- 

able (block 160), comprising an identifier operand for iden- cate that the function call return operation is returning a 

tifying the variable. If a declaration operation is found memory address for a freed memory block to the calling 

(block 161), instrumentation nodes are inserted after the function in the computer program. 

parse tree node corresponding to the declaration operation Finally, a flow analysis is performed on the source code 
(block 162). These nodes are for storing in a pointer record 55 t0 identify a free memory block operation using the pointer 

a value field for a memory address contained in the pointer memory variable being checked (block 189). If a free 

memory variable during execution. Initially, the pointer memory block operation is found (block 190), instrumenta- 

memory variable points to nothing and the pointer record is t ion nodes are inserted into the parse tree before the nodes 

therefore empty. corresponding to the free memory block operation (block 
A flow analysis is then performed on the source code to 60 191). During execution, the error-checking engine can indi- 

identify a memory allocation operation for allocating a cate that the free memory block operation is attempting to 

memory block to the pointer memory variable being free a memory block multiple times by using a pointer 

checked (block 163). If a memory allocation operation is memory variable pointing to an already freed memory block 

found (block 164), instrumentation nodes are inserted into or is attempting to free a stack frame (local variables) or is 
the parse tree after the nodes corresponding to the memory 65 attempting to free a memory block when the memory 

allocation operation (block 165). These nodes are for storing address does not equal the starting memory address of the 

an allocation record for use by the error-checking engine memory block. 
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Referring to FIG. 18, a diagram illustrating an instru- 
mented parse tree representation of the source code listing in 
FIG. 14 is shown. The instrumentation nodes for the decla- 
ration operation are inserted as declare node 200. The 
instrumentation nodes for the assignment operation are 5 
inserted as pointer assignment node 201. Similarly, the 
instrumentation nodes for a further assignment operation are 
inserted as reassign node 202. Finally, the instrumentation 
nodes for the pre-memory block operation inserted as check 
arg to deallocate node 203. Declaration node 200 indicates 3Q 
to the error-checkiug engine that the pointer memory vari- 
able "ptr" is declared and uninitialized. The pointer assign- 
ment node 200 and reassign node 202 indicate to the 
error-checking engine that the pointer memory variable 
"ptr" has been initialized and incremented, respectively. The J5 
check arg to deallocate node 203 indicates to the error- 
checking engine that the program is attempting to free the 
memory pointed to by the pointer memory variable "ptr." 

The fourth category of dynamic error is an inappropriate 
use of a pointer memory variable. Five types of errors can 2 q 
occur, comprising a pointer operation on a null pointer, a 
pointer operation on an uninitialized pointer, a pointer 
operation on a pointer that does not point to valid data, a 
pointer operation attempting to compare or otherwise relate 
memory pointers that fail to point to the same type of data 2S 
object, and an attempt to make a function call using a 
function pointer that does not point to a function. 

Referring to FIG. 19, a source code listing of a program 
segment containing an inappropriate use of a pointer 
memory variable is shown. Two long integer pointers "a" 30 
and "b" and a pointer to a function returning a long integer 
"foo" are defined (line 1). Next, the memory address of 
variable "a" is assigned using a cast to a pointer to a function 
returning a long integer "foo" (line 2). Finally, the return 
value of a function call to "foo" is assigned to variable "b" 35 
(line 3). This code segment is problematic because the 
function pointer "foo" actually points to a location in the 
program stack representing the memory block assigned to 
variable "a" instead of an appropriate entry point in the code 
segment. There fore, the function pointer "foo" has been 40 
inappropriately used and a pointer memory variable error 
condition occurs. 

Referring to FIG. 20, a diagram illustrating a parse tree 
representation of the source code listing in FIG. 19 is shown. 
The long integer variable declarations (line 1) are rep re- 45 
sented by variables declared node 210. The assignment 
operation (line 2) is represented by node 211, which has two 
operands, a variable (representing the left-hand side of the 
assignment) and an address expressions (representing the 
right-hand side of the assignment), represented respectively 50 
by nodes 212 and 213. The address expression node 213 
operates on variable "a", which is represented by variable 
node 214. The assignment node 211 is followed by assign- 
ment node 215, which represents the assignment to variable 
"b" (line 3). This node has two operands, a variable and a 55 
function call to "foo", represented respectively by nodes 216 
and 217. The call expression node 217 has one operand, a 
variable, represented by node 218. 

Referring to FIGS. 21 A and 21 B, a flow chart for a routine 
for an inappropriate use of a pointer memory variable error 60 
condition is shown. First, a flow analysis is performed on the 
source code to identify a declaration operation for a pointer 
memory variable (block 230), comprising an identifier oper- 
and for identifying the variable. If a declaration operation is 
found (block 231), instrumentation nodes are inserted after 65 
the parse tree node corresponding to the declaration opera- 
tion (block 232). These nodes are for storing in a pointer 
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record a value field for a memory address contained in the 
pointer memory variable during execution. Initially, the 
pointer memory variable points to nothing and the pointer 
record is therefore empty. 

Next, a flow analysis is performed on the source code to 
identify an operation using the pointer memory variable 
being checked (block 233). If an operation is found (block 
234), instrumentation nodes are inserted into the parse tree 
before the nodes corresponding to the operation (block 235). 
These nodes are for performing the five types of error checks 
listed above, including checking for operations on a null 
pointer, an uninitialized pointer, a pointer to invalid data, 
mismatched pointer types, and an invalid function call 
pointer. 

Referring to FIG. 22, a diagram illustrating an instru- 
mented parse tree representation of the source code listing in 
FIG. 19 is shown. The instrumentation node for the function 
pointer declaration operation is inserted as declare function 
"ptr" node 220. The instrumentation node for the function 
pointer check is inserted as func_ptr_check node 221. The 
declare function "ptr" node 220 is used by the error- 
checking engine for runtime pointer tracking. The func_ 
ptr_check node 221 is the actual check for a bad function 
pointer. During runtime, the error-checking engine deter- 
mines that the value assigned to the function pointer "foo", 
is an address on the stack and is not an appropriate function 
address. 

The fifth category of dynamic error is a memory leak 
error, which occurs when a dynamically allocated memory 
block is no longer referenced by a memory pointer and 
consequently can never be freed (deallocated). There are 
three types of errors. The first, a leak while freeing memory, 
can occur when a block of memory is freed which contains 
memory pointers that point to other allocated memory 
blocks. Any references to those allocated memory blocks are 
lost. The second, a leak return value, occurs when a function 
call returns an allocated memory block but the calling 
function does not assign that memory block to a pointer 
memory variable. The third, leak scope, occurs when a local 
pointer memory variable points to a memory block that is 
also local in scope and the function does not free the 
memory which it uniquely references before it goes out of 
scope. 

Referring to FIG. 23, a source code listing of a computer 
function containing a memory leak error is shown. A 
10-character memory block is allocated and its pointer 
assigned to pointer memory variable "ptr," which is a local 
variable defined in the same statement (line 3). The function 
"foo" uniquely references the memory block allocated 
whose memory address is assigned to the local pointer 
memory variable "ptr." The function returns (line 4) with 
"ptr" going out of scope. Consequently, the memory block 
formerly pointed to by "ptr" is leaked since "ptr" is no 
longer accessible. Therefore, a memory leak error occurs. 

Referring to FIG. 24, a diagram illustrating a parse tree 
representation of the source code listing in FIG. 23 is shown. 
The function declaration (lines 1, 2 and 5) is represented by 
function declaration node 240. The declaration of local 
pointer memory variable "ptr*' (line 3) is represented by a 
variable declaration node 241. Similarly, the allocation of 
the 10-character memory block is represented by call 
expression node 245, which has two operands, an argument 
and a function declaration, represented respectively by 
nodes 246 and 247. 

The function declaration calls a memory allocation rou- 
tine for dynamically allocating a block of memory. Such a 
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routine could be the "malloc( )" function call or the like in Instrumentation routines can also be used to introduce a 
C language. The result from this routine is assigned to the user-definable instrumentation routine known as an inter- 
local pointer memory variable (line 3), which is represented face. This type of routine enables a user to add their own 
by assignment node 242. This node has two operands, a m les for transforming the source code. An interface routine 
pointer memory variable and the function call, represented 5 can have the same behavior as the source code it is replacing 
respectively by nodes 244 and 245. The return operation or it can do something completely different, such as check- 
(line 4) is represented by node 243. ing values of var jables, simulating errors or performing any 

Referring to FIG. 25, a flow chart for a routine for other lype of dynamic tasks, 

detecting a memory leak error condition is shown. First, a ~ . n . fl A . t * „ , , 

flow analysis is performed on the source code to identify a 10 Really, ^" d <; fined interfaces allow the user to add 

declaration operation for a pointer memory variable (block 10 cu f om err ° r chec ^ n § 10 u func k tK l n calls 35 a n mean ? u of 

250), including an identifier operand for identifying the enforcing rules on the way that the function is caUed and the 

variable. If a declaration operation is found (block 251), Slde effects tnat 11 has on memory. These types of interfaces 

instrumentation nodes are inserted after the parse tree node check that a11 parameters are of the correct data type, that 

corresponding to the declaration operation (block 252). memory pointers point to memory blocks of the appropriate 

These nodes store a pointer record indicating information 15 size, and that each parameter value is within its correct 

about the block of memory that it points to. range. 

Next, a flow analysis is performed on the source code to Referring to FIG. 27 A, a program segment of a computer 

identify an exit from scope operation, such as a return from function to be instrumented with an interface is shown, 

a function call (block 253). If such an operation is found Here, the interface is for a memory allocation call using the 

(block 254), instrumentation nodes are inserted before the 20 "malloc( )" function (line 1). Referring to FIG. 27B, the 

parse tree node corresponding to the exit from scope opera- interface routine is shown. In this example, the interface is 

tion (block 255). These nodes are for detecting memory similar to a complicated macro definition because the given 

leaks. Thus, when the function exits or the pointer goes out function call is replaced by a user defined interface, 

of scope, the error-checking engine is informed by a "pop ™ „ c , t , , . . t 

,f \. t . T1 4l _ e \ to *u * The uc_ prefixed functions are expanded into function 

scope directive. Upon that occurrence, the engine can 25 n , i( _ L i j . 

examine the list of pointers declared in that scope. For each ca ^ t0 the ru " tim ^ back " end Processor Two such function 

pointer, if the block of memory that it is pointing to is only f lls are em P lo y ed "J thls exam P le - ™ e ( ) 

pointed to by a local pointer variable, the memory is leaked ^ ncil0n cal1 communicates error messages to the back-end 

when the pointer goes out of scope. processor (lines 5 and 10). The w iic_alloc( )" function call 

Memory leaks can be detected in one of two ways. The 30 communicates to the back-end processor that a block of 

first is during an assignment of a new address to a pointer memor y of (( si f " size " has been allocated and is pointed to 

variable. If the memory block that used to be pointed to by b y pointer "a" (line 8). 

the pointer is being reassigned and the memory block is only 1° addition, two further error checks are performed by the 

pointed to by that pointer, the block is leaked by the interface. First, it checks to see whether the size of the 

assignment operation. Second, a memory leak can occur 35 memory block being allocated is a positive number (lines 

upon the exiting of a scope. If there is a memory block which 4" 5 )- If it is, the memory allocation "malloc( )" function call 

is pointed to only by a pointer declared locally in scope is allowed to go forward (line 6). Next, the pointer memory 

within the function being exited, the memory block is variable "a" is checked to determine if the memory alloca- 

leaked. tion function call failed, and if so, the back-end processor is 

During operation, the error-checking engine initializes a 40 80 inf ornied (lines 7-10). 

pointer record for each pointer in a function upon activation. Referring to FIG. 28, a diagram illustrating a parse tree 

For any assignment of an address to a pointer, the pointer representation of the program segment shown in FIG. 27A 

record is updated to indicate that the pointer contains the is shown. The assignment operation (line 1) is represented 

address of an allocated memory block. Similarly, the by assignment node 280, which has two operands, a variable 

memory block record pointer list is updated to indicate that 45 operand (representing the left-hand side of the assignment) 

the pointer is pointing to that block. Finally, upon the exit and a function call expression operand (representing the 

from the routine, all pointer records are cleared. right-hand side of the assignment), represented respectively 

Referring to FIG. 26, a diagram illustrating an instru- °y nodes 281 and 282 In mrn > the cal1 expression node 282 

mented parse tree representation of the source code listing in has two operands, a function declaration for a "malloc( )" 

FIG. 23 is shown. The assignment of the local allocated 50 mnction and an integer constant, represented respectively by 

memory block to the local pointer memory variable in nodes 283 and 284. 

assignment node 242 indicates to the error-checking engine Referring to FIG. 29, a flow chart for a routine for 

that variable "ptr" is pointing to a particular memory block. inserting an interface is shown. Before an interface can be 

Subsequently, when the pointer goes out of scope, the used, it must first be pre-processed to convert it from source 

pointer record maintained by the error-checking engine is 55 code into an intermediate form and then stored in a database 

removed from the memory block record. Since that record for later use (block 29). A flow analysis is then performed on 

now has an empty list of pointers pointing to it, the error- the source code to identify function calls having a corre- 

checking engine can detect that memory has been leaked. sponding interface description to that stored in the database 

The instrumentation nodes for the declaration operation arc (block 291). If a matching function call is found (block 292), 

inserted as declare local pointer node 270. The instmmen- 60 the interface is inserted into the parse tree in a multi-step 

tation nodes for the pop scope operation are inserted as pop process. 

scope node 271. Declare local pointer node 270 indicates to This process includes the step of first removing the 

the error-checking engine that the local pointer memory existing function call from the parse tree (block 293). Next, 

variable "ptr" is declared and uninitialized. The pop scope the stored intermediate form for the corresponding interface 

node 271 indicates to the error-checking engine that "ptr" 65 is read from the database (block 294). The stored interme- 

has gone out of scope due to a return operation from the diate form is inserted as interface nodes in the parse tree in 

function. place of the node corresponding to the original function call 
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(block 295). The original function call arguments are sub- 
stituted into placeholders in the interface nodes (block 296). 
This enables the interface to actually perform the original 
function which is called within the interface function itself. 
Finally, the return statement and the interface node arc 
replaced with an assignment of the result of the interface 
routine to the actual call to the original function call (block 
297). This enables the original calling function to receive the 
result that was expected without the interface. 

Referring to FIG. 30, a diagram illustrating an instru- 
mented parse tree representation of the program segment 
shown in FIG. 27A is shown. This parse tree differs from 
those used in other parts of the invention. The original parse 
tree node representation has been broken into two sections 
and grafted onto the intermediate form for the interface 
routine. The entire parse tree shown in FIG. 30 is grafted in 
place of the parse tree shown in FIG. 28. 

The grafted parse tree segment shown in FIG. 30 reflects 
the program structure of the interface routine source code. 
The conditional check for a positive memory block size is 
inserted as "if conditional" node 300 with the conditional 
test represented by node 301 (line 4). The body of the 
conditional statement is represented by error notify node 302 
and error string node 303 (line 5) which communicate to the 
runtime back-end processor the occurrence of a dynamic 
runtime error condition. The original function call to the 
memory allocation routine "malloc( )" (line 6) is represented 
by assignment node 304 which has two operands, a variable 
and an expression call, represented respectively by nodes 
305 and 282. Note that nodes 282, 283 and 284 are substi- 
tuted into the interface intermediate form in the place of 
placeholders. 

The error check for a memory allocation operation failure 
is represented by "if conditional" node 306 with the condi- 
tional test represented by node 307 (line 7 and 9). The body 
of the "then" condition is represented by allocation notify 
node 308 (line 8) which tells the runtime processor that a 
block of memory of size "size" has been allocated and is 
pointed to by pointer "a". The body of the "else" condition 
is represented by error notify node 309 (line 10) which has 
one operand, error string node 310. Finally, an assignment 
condition is grafted to the end of the intermediate form to 
assign the result from the interface, represented by variable 
node 311, to the original function call. 

Instrumentation routines can also be used to insert support 
for automatic test case generation. One embodiment of the 
present invention is described in "Overview of the Design of 
TGS System," which is attached as Appendix B, the subject 
matter of which is hereby incorporated by reference as if set 
forth in full. By performing a flow analysis of the source 
code, a two-fold criteria can be satisfied. First, instrumen- 
tation routines can be inserted to automatically generate 
program inputs to achieve full testing of all flow paths in the 
executable program. Second, instrumentation routines can 
be used to identify inputs that cause the program to perform 
incorrectly. 

The method involves analyzing the source code to iden- 
tify points where input data is needed. Next, various tech- 
niques are employed, ranging from random number genera- 
tion to heuristic flow analysis techniques, to generate a set 
of input cases that satisfy the two-fold criteria stated above. 
The resulting executable program is linked to a test harness 
which repeatedly runs the program with different input 
values and adds unique test cases to a database of test case 
data. The testing algorithm converges when the two-fold 
criteria is met or when no new test cases can be generated 
in a reasonable amount of time. 
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Referring to FIG. 31, a source code listing of a computer 
function to be instrumented for automatic test case genera- 
tion is shown. The purpose of this function is to accept an 
input character and determine whether it is an integer. Three 

5 character memory variables arc declared, "b," "c" and 
pointer "ptr" (line 3). An input function "getchar( )" is called 
to obtain an input character whose value is assigned to 
variable "c" (line 4). That value is checked to determine 
whether it falls in the numeric range of ASCII codes for 

30 integer characters (line 5). If it does not, the pointer memory 
variable "ptr" is set to 0 (line 6), The variable "b" is assigned 
the difference of a "0" ASCII character code subtracted from 
the input character stored in variable "c" (line 8). A problem 
with this function is that the pointer memory variable "ptr" 
is not yet initialized before it is dereferenced by setting it to 

" 0. 

Referring to FIG. 32, a diagram illustrating a parse tree 
representation of the source code listing in FIG. 31 is shown. 
The function declaration (lines 1, 2 and 9) is represented by 
function declaration node 500. The declaration of variables 

20 "a" "b" and "ptr" (line 3) is represented by variables 
declared node 501. The assignment of the input character is 
represented by assignment node 502, which has two 
operands, a variable and a function call, represented respec- 
tively by nodes 503 and 504. The function call, represented 

25 by call expression node 504, has a single operand, function 
deel node 505 which contains the identifier for the "getchar( 
)" function. The conditional statement (line 5) is represented 
by if stmt node 506, which points to a node containing the 
conditions to be tested, represented by node 507. The body 

3Q of the condition node contains an assignment statement (line 
6) represented by assignment node 508, which has two 
operands, a pointer reference and an integer, respectively 
represented by nodes 509 and 510. 

Referring to FIG. 33, a diagram illustrating an instru- 
mented parse tree representation of the source code listing in 

35 FIG. 31 is shown. The instrumentation nodes for the decla- 
ration operation arc inserted as declare node 520. The 
instrumentation nodes for the character input generation 
operation are inserted as input generator node 521. The 
instrumentation nodes for the uninitialized pointer check 

40 operation are inserted as check pointer node 522, Declare 
node 520 indicates to the error-checking engine that the local 
pointer memory variable "ptr" is declared and uninitialized. 
The input generator node 521 indicates to the error-checking 
engine that the function call to "getchar( )" is replaced by a 

45 test case generator function which generates random inputs 
between 0 and 255. The check pointer node 522 indicates to 
the error-checking engine that the function is attempting to 
use a pointer memory variable that may not have been 
initialized previously and is probably pointing to an invalid 

so memory address. 

As will be realized, the present invention is capable of 
other and different embodiments and its several details are 
capable of modifications in various respects, all without 
departing from the spirit and scope of the present invention. 

55 Accordingly, the drawings and detailed description of the 
preferred embodiment are to be regarded as illustrative in 
nature and not as restrictive. 
What is claimed is: 

1. A method using a computer for instrumenting in real 
60 time a computer program source code to facilitate the 
detection of runtime errors, the computer program being 
represented by a parse tree, each such runtime error having 
at least one instrumentation rountine for communicating an 
occurence of the associated runtime error to an error- 
65 checking engine, each such instrumentation routine being 
represented by a parse tree fragment, comprising the steps 
of: 
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performing a real time flow analysis on the parse tree for 
the computer program to determine an appropriate 
instrumentation routine for detecting the runtime 
errors; 

instrumenting in real time the parse tree for the computer 5 
program by grafting the parse tree fragment for each 
such instrumentation routine onto the parse tree for the 
computer program; 

generating executable object code from the grafted parse 
tree containing references to the instrumentation rou- 10 
tines; and 

executing the generated object code on a computer by 
transferring the runtime errors to the error-checking 
engine and linking the error-checking engine with the 
computer program to determine errors. 15 

2. A method according to claim 1, wherein the runtime 
error comprises a read operation to an uninitialized memory 
variable, further comprising the steps of: 

retrieving a memory address for a memory variable from 
a stack frame represented as interconnected nodes 20 
stored in the parse tree and augumenting the parse tree 
with the instrumentation nodes for indicating that the 
memory variable is possibly uninitialized; 

identifying such a read operation to the memory address 
and augmenting the parse tree with the instrumentation 25 
nodes for indicating that the possibly unintialized 
memory variable is being read by the read operation; 
and 

identifying a write operation to the memory address and 3Q 
augmenting the parse tree with the instrumentation 
nodes for indicating that the possibly unintialized 
memory variable is initialized. 

3. A method according to claim I, wherein the runtime 
error comprises a write operation to an invalid memory 35 
address for a complex memory variable, further comprising 
the steps of: 

identifying a declaration operation of the complex 
memory variable comprising an identifier operand and 
a dimension operand that defines a memory address 4Q 
range and augmenting the parse tree with the instru- 
mentation nodes for storing the identifier operand and 
the dimension operand; and 

identifying such a write operation using the complex 
memory variable and augmenting the parse tree with 45 
instrumentation nodes for indicating an access by the 
write operation to at least one of such a memory 
address indentified by the indentifier operand that is 
invalid or such a memory address that falls outside of 
the memory address range defined by the dimension 50 
operand. 

4. A method according to claim 1, wherein the runtime 
error comprises a runtime memory allocation operation 
using a pointer memory variable, further comprising the 
steps of: 55 

identifying a declaration operation for such a pointer 
memory variable comprising an indentifier operand and 
augmenting the parse tree with instrumentation nodes 
for storing a pointer record for the pointer memory 
variable and a corresponding memory address; 60 

identifying such a runtime memory allocation operation 
for allocating a memory block to the pointer memory 
variable and augmenting the parse tree with instrumen- 
tation nodes for storing an allocation record for the 
memory block indentified by the indentifier operand; 65 

identifying an assignment operation assigning trie corre- 
sponding memory address to the pointer memory van- 



,029 

20 

able and augmenting the parse tree with instrumenta- 
tion nodes for storing the corresponding memory 
address into the pointer record; and 
augmenting the parse tree with instrumentation nodes for 
performing a runtime memory manipulation check 
using the pointer record and the allocation record. 

5. A method according to claim 1, wherein the runtime 
error comprises an inappropriate use of a pointer memory 
variable, further comprising the steps of: 

identifying a declaration operation for such a pointer 
memory variable and augmenting the parse tree with 
instrumentation nodes for storing a pointer record for 
the pointer memory variable and a corresponding 
memory address; 

identifying an operation using the pointer memory vari- 
able and augmenting the parse tree with instrumenta- 
tion nodes for indicating whether at least one of the 
operation is using the pointer memory variable when 
the corresponding memory address is equal to null, the 
operation is using the pointer memory variable when 
the corresponding memory address is unintialized, the 
operation is using the corresponding memory address 
not pointing to valid data, the operation is comparing 
the pointer memory variable not pointing to identical 
types of data or the operation is making a function call 
using the pointer memory variable not containing a 
valid function address. 

6. A method according to claim 1, wherein the runtime 
error comprises a memory leak, further comprising the steps 
of: 

identifying a declaration operation for a pointer memory 
variable and augmenting the parse tree with instrumen- 
tation nodes for storing a pointer record for the pointer 
memory variable; and 

identifying an exit from scope operation and augmenting 
the parse tree with instrumentation nodes for checking 
for memory leaks using the scope operation and the 
pointer record. 

7. A system for instrumenting a computer program to 
facilitate the detection of runtime errors, the computer 
program being represented by a parse tree, each such runt- 
ime error having at least one instrumentation routine for 
communicating an occurrence of the associated runtime 
error to an error-checking engine, each such instrumentation 
routine being represented by a parse tree fragment, com- 
prising: 

means for performing a real time flow analysis on the 
parse tree for the computer program to determine an 
appropriate instrumentation routine for detecting the 
runtime errors; 

means for instrumenting in real time the parse tree for the 
computer program by grafting the parse tree fragment 
for each such instrumentation routine onto the parse 
tree for the computer program; 

means for generating executable object code from the 
grafted parse tree containing references to the instru- 
mentation routines; and 

means for executing the generated object code by trans- 
ferring the runtime errors to an error-checking engine 
and linking the error-checking engine with the com- 
puter program to determine errors. 

8. A method using a computer for instrumenting an 
intermediate representation of a computer program for 
dynamic debugging, comprising the steps of: 

creating in real time the intermediate representation of the 
computer program in the computer by translating 
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source code comprising the computer program into an 
equivalent intermediate form; 

analyzing in real time the intermediate representation of 
the computer program for instruction flow for deter- 
mining an appropriate type of instrumentation for use 5 
in dynamic debugging; 

augmenting in real time the intermediate representation of 
the computer program with at least one further inter- 
mediate representation wherein the at least one further 
intermediate representation comprises the appropriate 30 
type of instrumentation translated into the equivalent 
intermediate form; 

generating executable object code from the augmented 
intermediate representation of the computer program ^ 
containing references to the instrumentation routines 
and storing the executable code in the computer; and 

executing the executable object code by transferring runt- 
ime errors to an error-checking engine and linking the 
error-checking engine with the computer program to 20 
determine errors. 

9. A method according to claim 8, wherein the step of 
creating the intermediate representation of the computer 
program further comprises the step of: 

building a parse tree as the equivalent intermediate form. 25 

10. A method according to claim 9, further comprising the 
steps of: 

lexically analyzing the source code of the computer 

program to form a sequence of tokens; 
parsing the sequence of tokens to form grammatical 30 

phrases; 

semantically analyzing the grammatical phrases to form 
the parse tree. 

11. A method according to claim 8, wherein the step of 35 
augmenting the intermediate representation of the computer 
program further comprises the steps of: 

inserting such intermediate representations of instrumen- 
tation for checking functionality of the source code of 
the computer program and communicating the func- 40 
tionality to the error-checking engine; and 

inserting such intermediate representations of instrumen- 
tation for communicating non-functional information 
regarding the source code of the computer program to 
the error-checking engine. 45 

12. A method according to claim 11, further comprising 
the steps of: 

communicating at least one of a declaration of variables, 

pointer addresses and pointer sizes; 
communicating assignments of pointers; 50 
communicating at least one of function entry and exit 

point indicators; and 
communicating memory allocation indicators. 

13. A method according to claim 11, further comprising 55 
the steps of: 

detecting a read operation to an uninitialized memory 
variable; 

detecting a write operation to an invalid memory address 
for a complex memory variable; 60 

detecting a dynamic memory manipulation error using a 
pointer memory variable; 

detecting an inappropriate use of a pointer memory vari- 
able; 65 

detecting a memory leak error; 

inserting a user-defined instrumentation routine; and 
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inserting an automatic test case generation routine. 

14. A method using a computer for instrumenting an 
intermediate representation of a computer program for 
dynamic debugging, comprising the steps of: 

creating the intermediate representation of the computer . 
program in the computer by translating source code 
comprising the computer program into an equivalent 
intermediate form consisting of at least an additional 
sequence of code instructions; 

analyzing in real time the additional sequence of code 
instructions to determine an appropriate type of instru- 
mentation for use in the dynamic debugging; 

augmenting in real time the additional sequence of code 
instructions with further code instructions to invoke the 
appropriate type of instrumentation; 

generating executable object code containing references 
to the instrumentation routines using the augmented 
sequence of additional code and storing the executable 
code in the computer, and 

executing the executable code by transferring runtime 
errors to an error-checking engine and linking the 
error-checking engine with the computer program to 
determine errors. 

15. A method according to claim 14 wherein the appro- 
priate type of instrumentation includes code directed to 
detecting a read operation to an uninitialized memory vari- 
able. 

16. A method according to claim 14 wherein the appro- 
priate type of instrumentation includes code directed to 
detecting a write operation to an invalid memory address for 
a complex memory variable. 

17. A method according to claim 14 wherein the appro- 
priate type of instrumentation includes code directed to 
detecting an inappropriate use of a pointer memory variable. 

18. A method according to claim 14 wherein the appro- 
priate type of instrumentation includes code directed to 
detecting dynamic memory manipulation error using a 
pointer memory variable. 

19. A method according to claim 14 wherein the appro- 
priate type of instrumentation includes code directed to 
detecting a memory leak error. 

20. A method according to claim 14 wherein the appro- 
priate type of instrumentation includes code corresponding 
to a user defined instrumentation routine. 

21. A method according to claim 14 wherein the appro- 
priate type of instrumentation includes code corresponding 
to an automatic test case generation routine. 

22. A method using a computer for dynamic debugging, 
comprising the steps of: 

initiating compilation of a computer program; 

interrupting the compilation of a computer program in 
order to capture the intermediate representation of the 
computer program created by the compilation process; 

augmenting in real time the intermediate representation of 
the computer program to add instrumentation for use in 
dynamic debugging; 

reinitiating the compilation process to generate execut- 
able object code containing references to the instru- 
mentation routines; 

executing the object code by transferring runtime errors to 
an error-checking engine and linking the error- 
checking engine with the computer program to deter- 
mine errors. 

23. A method using a computer for instrumenting in real 
time a computer program source code to facilitate the 
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detection of runtime errors, the computer program being 
represented by a parse tree, each such runtime error having 
at least one instrumentation routine for communicating an 
occurrence of the associated runtime error to an error- 
checking engine, each such instrumentation routine being 5 
represented by a parse tree fragment, comprising the steps 
of: 

performing a real time flow analysis on the parse tree for 
the computer program to determine an appropriate 
instrumentation routine for detecting the runtime 10 
errors; 

instrumenting in real time the parse tree for the computer 
program by grafting the parse tree fragment for each 
such instrumentation routine onto the parse tree for the 
computer program; 3S 

generating executable object code from the grafted parse 
tree containing references to the instrumentation rou- 
tines; and 

continuously executing the generated object code on the 
computer by transferring runtime errors to an error- 20 
checking engine and linking the error-checking engine 
with the computer program to determine errors. 

24. A method according to claim 23, wherein the runtime 
error comprises a read operation to an uninitialized memory 
variable, further comprising the steps of: 25 

retrieving a memory address for a memory variable from 
a stack frame represented as interconnected nodes 
stored in the parse tree and augmenting the parse tree 
with the instrumentation nodes for indicating that the 3Q 
memory variable is possibly uninitialized; 

identifying such a read operation to the memory address 
and augmenting the parse tree with the instrumentation 
nodes for indicating that the possibly uninitialized 
memory variable is being read by the read operation; 35 
and 

identifying a write operation to the memory address and 
augmenting the parse tree with the instrumentation 
nodes for indicating that the possibly uninitialized 
memory variable is initialized. 4Q 

25. A method according to claim 23, wherein the runtime 
error comprises a write operation to an invalid memory 
address for a complex memory variable, further comprising 
the steps of: 

identifying a declaration operation for the complex 45 
memory variable comprising an identifier operand and 
a dimension operand that defines a memory address 
range and augmenting the parse tree with the instru- 
mentation nodes for storing the identifier operand and 
the dimension operand; and 50 

identifying such a write operation using the complex 
memory variable and augmenting the parse tree with 
instrumentation nodes for indicating an access by the 
write operation to at least one of such a memory 
address identified by the identifier operand that is 55 
invalid or such a memory address that falls outside of 
the memory address range defined by the dimension 
operand. 

26. A method according to claim 23, wherein the runtime 
error comprises a runtime memory allocation operation 60 
using a pointer memory variable, further comprising the 
steps of: 

identifying a declaration operation for such a pointer 
memory variable comprising a identifier operand and 
augmenting the parse tree with instrumentation nodes 65 
for storing a pointer record for the pointer memory 
variable and a corresponding memory address; 
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identifying such a runtime memory allocation operation 
for allocating a memory block to the pointer memory 
variable and augmenting the parse tree with instrumen- 
tation nodes for storing an allocation record for the 
memory block identified by the identifier operand; 

identifying an assignment operation assigning the corre- 
sponding memory address to the pointer memory vari- 
able and augmenting the parse tree with instrumenta- 
tion nodes for storing the corresponding memory 
address into the pointer record; and 

augmenting the parse tree with instrumentation nodes for 
performing a runtime memory manipulation check 
using the pointer record and the allocation record. 

27. A method according to claim 23, wherein the runtime 
error comprises an inappropriate use of a pointer memory 
variable, further comprising the steps of: 

identifying a declaration operation for such a pointer 
memory variable and augmenting the parse tree with 
instrumentation nodes for storing a pointer record for 
the pointer memory variable and a corresponding 
memory address; 

identifying an operation using the pointer memory vari- 
able and augmenting the parse tree with instrumenta- 
tion nodes for indicating whether at least one of the 
operation is using the pointer memory variable when 
the corresponding memory address is equal to null, the 
operation is using the pointer memory variable when 
the corresponding memory address is uninitialized, the 
operation is using the corresponding memory address 
not pointing to valid data, the operation is comparing 
the pointer memory variable not pointing to identical 
types of data or the operation is making a function call 
using the pointer memory variable not containing a 
valid function address. 

28. A method according to claim 23, wherein the runtime 
error comprises a memory leak, further comprising the steps 
of: 

identifying a declaration operation for a pointer memory 
variable and augmenting the parse tree with instrumen- 
tation nodes for storing a pointer record for the pointer 
memory variable; and 

identifying an exit from scope operation and augmenting 
the parse tree with instrumentation nodes for checking 
for memory leaks using the scope operation and the 
pointer record. 

29. A system for instrumenting in real time a computer 
program source code to facilitate the detection of runtime 
errors, the computer program being represented by a parse 
tree, each such runtime error having *at least one instrumen- 
tation routine for communicating an occurrence of the 
associated runtime error to an error-checking engine, each 
such instrumentation routine being represented by a parse 
tree fragments comprising: 

means for performing a real time flow analysis on the 
parse tree for the computer program to determine an 
appropriate instrumentation routine for detecting the 
runtime errors; 

means for instrumenting in real time the parse tree for the 
computer program by grafting the parse tree fragment 
for each such instrumentation routine onto the parse 
tree for the computer program; 

means for generating executable object code from the 
grafted parse tree containing references to the instru- 
mentation routines; and 

means for continuously executing the generated object 
code by transferring runtime errors to an error-checking 
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engine and linking the error-checking engine with the 
computer program to determine errors. 

30. A method using a computer for instrumenting in real 
time an intermediate representation of a computer program 
for dynamic debugging, comprising the steps of: 5 

creating the intermediate representation of the computer 
program in the computer by translating source code 
comprising the computer program into an equivalent 
intermediate form; 

analyzing in real time the intermediate representation of 30 
the computer program for instruction flow for deter- 
mining an appropriate type of instrumentation for use 
in dynamic debugging; 

augmenting in real time the intermediate representation of 15 
the computer program with at least one further inter- 
mediate representation wherein the at least one further 
intermediate representation comprises the appropriate 
type of instrumentation translated into the equivalent 
intermediate form; 2Q 

generating executable object code containing references 
to the instrumentation routines from the augmented 
intermediate representation of the computer program 
and storing the executable code in the computer; and 

continuously executing the generated object code by 25 
transferring runtime errors to an error-checking engine 
and linking the error-checking engine with the com- 
puter program to determine errors. 

31. A method according to claim 30, wherein the step of 
creating the intermediate representation of the computer 30 
program further comprises the step of: 

building a parse tree as the equivalent intermediate form. 

32. A method according to claim 31, further comprising 
the steps of: 

lexically analyzing the source code of the computer 35 

program to form a sequence of tokens; 
parsing the sequence of tokens to form grammatical 

phrases; 

semantically analyzing the grammatical phrases to form 40 
the parse tree. 

33. A method according to claim 30, wherein the step of 
augmenting the intermediate representation of the computer 
program further comprises the steps of: 

inserting such intermediate representations of instrumen- 45 
tation for checking functionality of the source code of 
the computer program and communicating the func- 
tionality to the error-checking engine; and 

inserting such intermediate representations of instrumen- 
tation for communicating non-functional information 50 
regarding the source code of the computer program to 
the error-checking engine. 

34. A method according to claim 33, further comprising 
the steps of: 

communicating at least one of a declaration of variables, 55 

pointer addresses and pointer sizes; 
communicating assignments of pointers; 
communicating at least one of function entry and exit 

point indicators; and 

r 60 

communicating memory allocation indicators. 

35. A method according to claim 33, further comprising 
the steps of: 

detecting a read operation to an uninitialized memory 
variable; 65 

delecting a write operation to an invalid memory address 
for a complex memory variable; 
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detecting a dynamic memory manipulation error using a 
pointer memory variable; 

detecting an inappropriate use of a pointer memory vari- 
able; 

detecting a memory leak error; 

inserting a user-defined instrumentation routine; and 

inserting an automatic test case generation routine. 

36. A method using a computer for instrumenting in real 
time an intermediate representation of a computer program 
source code for dynamic debugging, comprising the steps 
of: 

creating the intermediate representation of the computer 
program in the computer by translating source code 
comprising the computer program into an equivalent 
intermediate form consisting of at least an additional 
sequence of code instructions; 

analyzing in real time the additional sequence of code 
instructions to determine an appropriate type of instru- 
mentation for use in the dynamic debugging; 

augmenting in real time the additional sequence of code 
instructions with further code instructions to invoke the 
appropriate type of instrumentation; 

generating executable object code containing references 
to the instrumentation routines using the augmented 
sequence of additional code and storing the executable 
code in the computer; and 

continuously executing the executable code by transfer- 
ring runtime errors to an error-checking engine and 
linking the error-checking engine with the computer 
program to determine errors. 

37. A method according to claim 36 wherein the appro- 
priate type of instrumentation includes code directed to 
detecting a read operation to an uninitialized memory vari- 
able. 

38. A method according to claim 36 wherein the appro- 
priate type of instrumentation includes code directed to 
detecting a write operation to an invalid memory address for 
a complex memory variable. 

39. A method according to claim 36 wherein the appro- 
priate type of instrumentation includes code directed to 
detecting an inappropriate use of a pointer memory variable. 

40. A method according to claim 36 wherein the appro- 
priate type of instrumentation includes code directed to 
detecting dynamic memory manipulation error using a 
pointer memory variable. 

41. A method according to claim 36 wherein the appro- 
priate type of instrumentation includes code directed to 
detecting a memory leak error. 

42. A method according to claim 36 wherein the appro- 
priate type of instrumentation includes code corresponding 
to a user defined instrumentation routine. 

43. A method according to claim 36 wherein the appro- 
priate type of instrumentation includes code corresponding 
to an automatic test case generation routine. 

44. A method using a computer for dynamic debugging, 
comprising the steps of: 

initiating compilation of a computer program; 

interrupting the compilation of a computer program in 
order to capture the intermediate representation of the 
computer program created by the compilation process; 

augmenting in real time the intermediate representation of 
the computer program to add instrumentation for use in 
dynamic debugging; 

reinitiating the compilation process to generate execut- 
able object code containing references to the instru- 
mentation routines; 
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continuously executing the object code by transferring 
runtime errors to an error-checking engine and linking 
the error-checking engine with the computer program 
to determine errors. 

45. A method using a computer for instrumenting an 
intermediate representation of a computer program for 
dynamic debugging, comprising the steps of: 

creating the intermediate representation of the computer 
program in the computer by translating source code 
comprising the computer program into an equivalent 
intermediate form; 

analyzing in real time the intermediate representation of 
the computer program for instruction flow for deter- 
mining an appropriate type of instrumentation for use 
in dynamic debugging; 

augmenting in real time the intermediate representation of 
the computer program with at least one further inter- 
mediate representation wherein the at least one further 
intermediate representation comprises the appropriate 
type of instrumentation translated into the equivalent 
intermediate form; 

generating executable object code containing references 
to the instrumentation routines from the augmented 
intermediate representation of the computer program 
and storing the executable code in the computer; and 

executing the code executable in real time by transferring 
runtime errors to an error-checking engine and linking 
the error-checking engine with the computer program 
to determine errors. 

46. A method according to claim 45, wherein the step of 
creating the intermediate representation of the computer 
program further comprises the step of: 

building a parse tree as the equivalent intermediate form. 

47. A method according to claim 46, further comprising 
the steps of: 

lexically analyzing the source code of the computer 

program to form a sequence of tokens; 
parsing the sequence of tokens to form grammatical 

phrases; 



10 



20 



30 



35 



semantically analyzing the grammatical phrases to form 
the parse tree. 

48. A method according to claim 45, wherein the step of 
augmenting the intermediate representation of the computer 
program further comprises the steps of: 

inserting such intermediate representations of instrumen- 
tation for checking functionality of the source code of 
the computer program and communicating the func- 
tionality to the error-checking engine; and 

inserting such intermediate representations of instrumen- 
tation for communicating non- functional information 
regarding the source code of the computer program to 
the error-checking engine. 

49. A method according to claim 48, further comprising 
the steps of: 

communicating at least one of a declaration of variables, 

pointer addresses and pointer sizes; 
communicating assignments of pointers; 
communicating at least one of function entry and exit 

point indicators; and 
communicating memory allocation indicators. 

50. A method according to claim 48, further comprising 
the steps of: 

detecting a read operation to an uninitialized memory 
variable; 

detecting a write operation to an invalid memory address 
for a complex memory variable; 

detecting a dynamic memory manipulation error using a 
pointer memory variable; 

detecting an inappropriate use of a pointer memory vari- 
able; 

detecting a memory leak error; 

inserting a user-defined instrumentation routine; and 

inserting an automatic test case generation routine. 
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Fig. 4(A) 



TYPE: LINDA (Generic) 
COOE (OPERATOR): LOAD 

DATATYPE: il, 12. 14. ul. u2, u4. r4. r8, rl6. c8, cl6. c32 
OPERANDS 

1: std, prg 

2: axd, bxd, var 
CONDITIONS 

BOUNDARY: X (Unguaranteed) 

ADDRESS REPRESENTATION: 3 

TYPE: LINDA (Generic) 
CODE (OPERATOR): ADD 

DATATYPE: il, i2, 14. ul. u2. u4. r4. r8. rl6. c8, cl6. c32 
OPERANDS 

1: std, prg, var 

2: axd. bxd, var 

3: std, prg, var, cnt 
CONDITIONS 

CONSTANT RANGE: X (Unlimited) 

TYPE: LINDA (Generic) 
CODE (OPERATOR): cmove 
DATA TYPE: char 
OPERANDS 

1: axd, bxd 

2: axd, bxd 
CONDITIONS 
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Fig. 4(B) 



TYPE: LINDA (Generic) 

CODE (OPERATOR): LOAD 

DATA TYPE: il, i2, 14. ul. u2. u4. r4. r8 

OPERANDS 

1: std, prg 

2: axd, bxd 
CONDITIONS 

BOUNDARY: axd, bxd (Boundaries for axd and bxd must be 
su&rcin^eed* ) 

ADDRESS REPRESENTATION: 2 (Memory address with two values) 

TYPE: LINDA (Generic) 

CODE (OPERATOR): ADD 

DATA TYPE: il. 12. 14. ul, u2. u4. r4. r8 

OPERANDS 

1: std, prg 

2: std, prg 

3: std, prg, cnt 
CONDITIONS 

CONSTANT RANGE: -1095<=cnt<=4095 

: cmove has been elimidated. 
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Fig. 5 
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1 ] 1_ — ' 

19 | Leaf Routine Optimization | I O I 
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20 | Peephole Optimization J Oj O l 

21 I Branch Optimization J T I 
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Fig. 6(A) 



ENTRY No. : 1 

OPTIMIZATION NAME: constant folding 

APPLICABLE TYPE OF INTERMEDIATE REPRESENTATION: LINDA 

OPERATOR: * (Don't care.) 

OPERANDS: cnt 
INTERMEDIATE REPRESENTATION DEPENDENCIES: 

CONSTANT RANGE: * 
APPLICATION CONDITIONS: 

Times: - 
DEPENDENCIES: 



ENTRY No. : 4 
OPTIMIZATION NAME: CSE 

APPLICABLE TYPE OF INTERMEDIATE REPRESENTATION: LINDA 
OPERATOR: * 

OPERANDS: std. prg, var, cnt 
INTERMEDIATE REPRESENTATION DEPENDENCIES: 

CONSTANT RANGE: * (Don* t care) 

BOUNDARY: X 

ADORESS: * 
APPLICATION CONDITIONS: 

Times: - 
DEPENDENCIES: 



ENTRY No. : 14 

OPTIMIZATION NAME: User Function Inline 
FUNCTION NAME: 

APPLICABLE TYPE OF INTERMEDIATE REPRESENTATION: LINDA 

OPERATOR: * 

OPERANDS: * 
INTERMEDIATE REPRESENTATION DEPENDENCIES 
APPLICATION CONDITIONS: 

Times: 1 
DEPENDENCIES: 

Before 1, 2 13, 15 22 
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Fig. 6(B) 



ENTRY No. : 16 

OPTIMIZATION NAME: Inner Loop Unrolling 

APPLICABLE TYPE OF INTERMEDIATE REPRESENTATION: LINDA 

OPERATOR: * 

OPERANDS: * 
INTERMEDIATE REPRESENTATION DEPENDENCIES: 
APPLICATION CONDITIONS: 

Times: 1 
DEPENDENCIES: 

After: 14, 1, 2, ... 

ENTRY No. : 18 

OPTIMIZATION NAME: Global register allocation 
APPLICABLE TYPE OF INTERMEDIATE REPRESENTATION: LINDA 

OPERATOR: * 

OPERANDS: * 
INTERMEDIATE REPRESENTATION DEPENDENCIES: 

CONSTANT RANGE: * (Don' t care) 

BOUNDARY: X 

ADORESS: 2 
APPLICATION CONDITIONS: 

Times: 1 
DEPENDENCIES: 

Before: 22 

After: 1, 2, 3, 4 17, 19 21 

ENTRY No.: 22 

OPTIMIZATION NAME: Instruction Scheduling 
APPLICABLE TYPE OF INTERMEDIATE REPRESENTATION: LINDA 

OPERATOR: * 

OPERANDS: * 
INTERMEDIATE REPRESENTATION DEPENDENCIES 

CONSTANT RANGE: * (Don't care.) 

BOUNDARY: bxd, axd 

ADORESS: 2 
APPLICATION CONDITIONS: 

Times: 1 
DEPENDENCIES: 

Before: 6 

After: 1, 2, 3, 4, ... 
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Fig. 7(A) 



Table of intermediate representations 

struct SET J ML { 

IMLJFYPE Type of intermediate representation; 

CODEJYPE Operation code 

LIST List of types 

LIST List of operands (n) 

struct Conditions { 

NUMBER address; 

RANGE const; 

FLAGS boundary: 
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Fig. 7(B) 



Data structure of table of optimization function 

struct OPTIMIZE JBL { 

NIM3ER Optimization function number 
NAME Optimization function name 

IMLJfYPE Type of intermediate representation 
LIST List of applied operators 

LIST List of applied operands 

struct cond_tag { 

NUMBER address; 

RANGE const; 

FLAGS boundary; 

} 

struct opt_cond_tag { 
NUMBER times; 

} 

List Before; 
List After; 
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Fig. 8 
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Fig. 9 



A SOURCE PROGRAM AND THE OPERATIONS OF A COMPILER 
ACCORDING TO AN EMBODIMENT OF THE PRESENT INVENTION 

struct aaa { 

int ii.jj; 

double fd; 
} s_l.S_2; 

double zz [100] , xx; 

void sub(n, x) 
int n; 
double x; 

{ 

double dd [100]; 

int i; 

s 1 = s 2; 

for (i - 6 ; i < n ; i++) 

dd [i] = dd [i] + (xx+ x) = zz [i] ; 
/* s_l. i i = s_l. i i - s_l. jj; * / 

subl(dd); 
return; 

} 
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FRONT-END OUTPUTS OF THE EMBODIMENT 
JflabelJ: 

cmove axd:struct:("s 1") [0.0] axdCs 2") [0,0] 
#label_2: 

move var :(T) cnt:0 

bge #label_9 var(T) var(V) 

mult ~ "std#17d784 varCD cnt:8 

load std#17d7e8 axdCdd") [0, std:#17d784] 

add std:#17d820 var:Cxx") var:("x") 

mult std:#17d858 vanCi") cnt:8 

load std:#17d8bc axdCzz") [0 std:#17d858] 

mult std:#17d8f4 std:#17d820 std:#17d8bc 

mult std:#17d92c var:(T) cnt:8 

add std:#17d990 std:#17d7e8 std:#17d8f4 

store axd(-dd-) [0. std:#17d92c] std:#17d990 

load std:#17da74 axdCsJ") [0] 

load std:#17dad8 axdCsJ") [4] 

sub std:#17db3c std:#17da74: std:#17dad8 

store axdCsJ") [0] std:#17db3c: (i4) 

#1abel_7: 

move std:#17dc80 var:("i") 

add var:("i") std:#17dc80 cnt:l 

bit #label_6 vanCi") var:(V) 

#label_9: 
raturn 
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Fig.12 



INTERMEDIATE REPRESENTATIONS AFTER INTERMEDIATE REPRESENTATION 
CHANGE (IE1) OF THE EMBODIMENT (PART 1) 

KlabeLO: 

entry prg:#18fe28 prg:#18feac prg:#18ff5c 

store bxd:CV) [Xfp+68] prg:#18fe28 

store bxd:(ted:#1704b8) [Xfp+72] prg:#18feac 

store bxd:(ted:#1704b8 [Xfp+76] prg:#18ff5c 

#label_l: 

movehi std:#18fffc adcrvar: ("s_r)+0 

or std:#18fa8c std:#18fffc adc:( - s_r)+0 

movehi std:#1900b4 abc: ("s_2")+0 

or std:#18fae4 std:#1900b4 adc: ("s_2'*)+0 

add prg:#19016c std:#18fa8c cnt:0 

add prg:#190190 std:#18fae4 cnt:0 

move prg:#1901dc cnt:16 

#label_14: . 

sub prg:1901dc prg:#1901dc cnt:l 

load std:#1904a0 bxd:rs_2 w ) [prg:#190190] +0 prg:#1901dc] 
store bxd:Cs_r) [prg:#19016c] +0 prg:#1901dc] std:#1904a0 
cmp prg:#18fdc8 prg:#1901dc cnt:0 
bne #labeL14(00190304) prg:#18fdc8 

#label_2: 

load std:#190e8c bxd:(V) [ftfp+68] 

add prg:#183e3c std:#190e8c cnt:l 

sub prg:#1847d4 std:#183e3c cnt*-l 

load std:#190f2c bxd:(V) [Xfp+68] 

cmp prg:#1904e8 std:#190f2c cnt:0 

ble #label_9 P rg:#1904e8 

llabel 10: 

movehi std:#191000 adc: ("XX")+0 

load std:#190ff0 bxd:("XX") [std:#191000] +0 adc("XX79)+0] 

load std:#1910e0 bxd:(T) [Xfp+723 

add prg:#17d818 std:*190ffO std:#1910e0 

move prg:# 184188 cnt:0 

srl prg:#183e2c prg:#1847d4 cnt:l 

and std:#18480c prg:#1847d4 cnt:l 

cmp prg:#1904e8 std:tfl8480c cnt:0 

beq #labe1_3 prg:#1904e8 
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Fig .13 

INTERMEDIATE REPRESENTATIONS AFTER INTERMEDIATE REPRESENTATION 
CHANGE (IE1) OF THE EMBODIMENT (PART 2) 

#label_ll: 

add std:#1902e4 var: (Xfp)prg:# 184188 

load std:#184924 bxd:("dd") [std:#1902e4] +-800] 

movehi std : if 19126c adc: Czz76)+0 

or std:#190340 std:#19126c adc("zz")+0 

load std:#1849a8 bxd:("zz") [std:#190340] +0 prg:#184188] 

mult std:#184a04 prg:#17d818 std:#1849a8 

add std:#184a64 std:#184924 std:#184a04 

store bxd("dd") [std:#1902e4] +-800] std:#184a64 

add prg:#184188 prg:#184188 cnt:#1701cc: 14:8 
cmp prg:#1904e8 prg:#183e2c cnt:#1700fc:l4:8 
beq #label_9 prg:#1904e8 



Klabel_3: 

add prg : 190198 var (Xfp) prg:#184188 
#label_6: 

load std:#17d7e0 bxdCdd") [ P rg:#190198] +-800] 

movehi std : #191480 adc ("zz")+0 

or std:#18fd80 std:#191480 adc(-zz")+0 

load std:#17d8b4 bxd:("zz") [std:#18fd80] +0 prg:#1841883 

mult std:#17d8ec prg:#17d818 std:#17d8b4 

add std:#17d988 std:#17d7e0 std:#17d8ec 

store bxd:("dd") [prg: #190198] +-800] std:#17d988 

load std:#184c4c bxd("dd") [prg:#190198] +-792] 

movehi std:#191620 adc("zz")+0 

or std:#18fe98 std:#191620 adc(-zz")+0 

add std:#18fef4 std:#18fe98 prg:#184188 . 

load std:#184cac bxdCzz") Cstd:#18f ef4] +8] 

mult std:#184d08 prg:#17d818 std:#184cac 

add std:#184d68 std:#184c4c std:#184d08 

store bxd("dd") [prg:#190198] +-7923 std:#184d68 

#label_7: 

add prg:#184188 prg:#184188 cnt:16 

add prg:#190198 prg:#190198 cnt:16 

sub prg:#183e2c prg:#183e2c cnt:l 

cmp prg:#1904e8 prg:#183e2c cnt:0 

bet #labe1 6 prg:#1904e8 

#label_9:(0017dc04) /epl/term 

return none 
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Fig. 14 

OUTPUTS AFTER OPTIMIZATION B OF THE EMBODIMENT (PART 1) 
JlabeLO: 

entry prg:#18fe28 prg:#18feac prg:#18ff5c 
store bxd:(V) [Xfp+68] prg:K18fe28 
store bxd:(ted:#1704b8) [Xfp+72] prg:ffl8feac 
store bxd:(ted:017O4b8)[Xfp+76] Prg:/fl8ff5c 

JflabeLl: 

movehi std:018fffc adc:var: ("s_l")+0 

or std:#18fa8c std:#18fffc adc:("s_D+0 

movehi std:#1900b4 abc: ("s_2")+0 

or std:#18fae4 std:#1900b4 adc: Cs_J2")+0 

add prg:#19016c std:018fa8c cnt:0 

add prg: #190190 std:#18fae4 cnt:0 

move prg:#1901dc cnt:16 

fllabel_14: 

sub (prg:ffl90904 prg:#1904f0) prg:«190904 cnt:»17ce38:u4:l 
load std:ffl90bc8 bxd:("s_2") [prg:#1908b8] +0 prg:#190904] 
store bxd:CsJ") [prg:Jf 190894] +0 prg:ffl90904] std:#190bc8 
bne fflabel 14 P rg:«1904f0 

#label_2: 

add "std:ffl83e44 prg:#190550 cnt:l 
sub prg:#1847dc std:»183e44 cnt:l 
cmp prg:#1904f0 prg:fll90550 cnt:0 
ble #labeL9 prg:#1904f0 

fflabeUO: 

movehi std:#191003 adc:("XX")+0 

load prg:#1924c0 bxdiCXX") [std:f 191008] +0 adc("XX")+0] 

load std:«1910e8 bxdrCX") [Xfp+72] 

add prg:«17d820 prg:#1924c0 std:#1910e8 

move prg:#184190 cnt:0 

srl prg:#183e34 prg:K1847dc cnt:l 

and (std:«184314.prg:<fl904f0) prg:ffl847dc cnt:#170130:14:l 

beq «;label_3 prg:#1904f0 

#label_ll (00184638) 

add std:J[1902ec var(Xfp) prg:*184190 

load std:#18492c bxd("dd") [std:#1902ec] +-800] 

movehi std:IH91274 adc:("zz")+0 

or std:#190348 std:#191274 adc("zz")+0 

load std:#1849b0 bxdCzz") Cstd: #1903483 +0 prg:#184190] 

mult std:fl84a0c prg:#17d820 std:ffl849bO 

add std:#184a6c std:018492c std:#184a0c 

store bxdCdd") [std:#1902ec] +-800 std:#184a6c 

add prg:#184190 Prg:fll84190 cnt:8 
cmp prg:|1904f0 prg:#183e34 cnt:0 
beq #label_9(0017d39c) prg:jK1904fO 
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Fig. 15 



OUTPUTS AFTER OPTIMIZATION B OF THE EMBODIMENT (PART 2) 
#labeL3(0017d2e8) 

add prg:#1901a0 varftfp) prg:#184190 

movehi std:#191488 adc("zz")+0 

or prg:#18fd88 std:#191488 adc("zz")+0 

#label_6(0017d6a0) /entr 
load std:#17d7e8 bxdCdd") [prg:#1901a03 +-800] 
load std:#17d8bc bxd("zz") [prg:#18f d883 +0 prg:#1841903 
mult std:#17d8f4 prg:#17d820 std:#17d8bc 
add std:#17d990 std:#17d7e8 std:#17d8f4 
store bxdCdd") [prg:#1901a03 +-800 ] std:#17d990 

load std:#184c54 bxdCdd") [prg:#1901a03 +-792 ] 

add std:#18fefc prg:#18fd88 prg:#184190 

load std:#184cb4 bxdCzz") [std:#18fefc3 +8 ] 

mult std:#184dl0 prg:#17d820 std:#184cb4 

add std:#184d70 std:#184c54 std:#184dl0 

store bxd(-dd") [prg:#1901a0] +-792 3 std:#184d70 

#1abe1 7: 

add " *prg:#184190 prg:#184190 cnt:16 

add prg:#1901a0 prg:#1901a0 cnt:16 

sub (prg:#183e34,prg:#1904f0) prg:#183e34 cnt:l 

bgt #albel_6 prg:#1904f0 

#label_9: 
return none 
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COMPILING APPARATUS AND A COMPILING 
METHOD 

BACKGROUND OF THE INVENTION 
I. Field of the Invention 

The present invention relates to a compiling appara- 
tus for improving the performance of and optimizing 
compiler. 

Recent computers are required to operate at high 
speed To improve the operation speed, processors that 
simultaneously execute a plurality of instructions and 
machines that carry out vector operations have been 
developed. These situations have raised the needs of 15 
optimizing compilers. 

Compilers translate a source program written in a 
programming language such as C, FORTRAN, and 
COBOL into another language such as assembler. The 



The intermediate representations optimized by the 
optimizing section are independent of languages and 
machines. 

There are compilers that change the structure of 
5 intermediate representations during optimization. Such 
compilers are explained in a reference 3 "Optimizing 
compilers for SPARC" by S. S. Muchnick, Sun Technol- 
ogy. 1988, and in a reference 4 "A global optimizer 
for Sun FORTRAN, C and Pascal" by Ghodssi, Vida, 
10 S. S. Muchnick, and Alex Wu, USENIX Conference, • 
1986. These compilers mainly substitute existing pro- 
cesses for partial processes during compilation and 
change the physical memory map of data structure of 
intermediate representations. 

Japanese Unexamined Patent Publication No. 
2-81137 discloses the structure of an optimizing com- 
piler. Objects of this disclosure are to (1) efficiently 
optimize a source program, (2) efficiently collect opti- 
mization data, (3) quickly find optimization functions to 



"optimization" of compiling means to translate a source 20 achieve, (4) eliminate side effect elements of the source 



25 



30 



program in such a way as to improve the execution 
speed of a translated program and reduce a memory 
area required by the translated program, without 
changing the meaning of the source program. 

Generally, compilers optimize a source program 
through intermediate representations, which are synon- 
ymous with intermediate languages, intermediate texts, 
or internal representations. The intermediate represen- 
tations are independent of a computer or a work station 
that executes a translated program. Some of the inter- 
mediate representations are eliminated, moved, and 
integrated through optimization, to thereby provide 
optimized object codes. The selection of intermediate 
representations, therefore, affects the performance and 35 
speed of optimization. If the types and contents of the 
intermediate representations are fixed according to a 
specific stand point, the optimization will frequently be 
unsuccessful because there are various kinds of optimi- 
zation and final objects that will not fit the fixed inter- 40 
mediate representations. 

The present invention relates to optimizing a com- 
piler that translates a prograrnming language into an 
assembler language or a machine language. 

2. Description of the Related Art 

Optimization means deleting, moving, and integrat- 
ing intermediate representations, to provide high-speed 
small-sized efficient object code. 

The intermediate representations must be indepen- 
dent of languages and machines when achieving the 
optimization and providing common code generators 
and standardized parts. The intermediate representa- 
tions are explained in, for example, a reference 1 "Com- 
piler: principles, techniques, and tools" by A.V. Aho et 
al, Science, and a reference 2 "A way of thinking of 
intermediate languages," Interface, March 1989, pp 211 
to 219. 

Conventional optimization involves a front end, an 
optimizing section, and a code output section. 

The front end provides intermediate representations 
according to a source program. The intermediate repre- 
sentations are optimized by the optimizing section, to 
improve the execution speed of the program and reduce 
the size of the program. The optimized result is supplied 
to the code output section, which provides code based 
on a machine-dependent programming language such as 
a machine language dedicated to a target machine. 
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program, and (5) clear side effects of the source pro- 
gram. This prior art initially changes an intermediate 
language (intermediate representations) and optimizes 
the changed one. 

As explained above, the conventional compilers de- 
sign and prepare an optimizing section according to 
fixed intermediate representations, causing the follow- 
ing problems: 

(1) Machine instructions and intermediate representa- 
tions do not always have a one-to-one relationship. 
For example, if an instruction scheduling function 
that determines the positions of different machine 
instructions is compiled under a single intermediate 
representation, the intermediate representation will 
be unable to optimize the different machine instruc- 
tions involved in the function, 

(2) Even if all intermediate representations provided 
by a front end are related to machine instructions, 
respectively, there will still be the following prob- 
lems: 

(a) The intermediate representations provided by 
the front end must be independent of a target 
machine. If not so, different intermediate repre- 
sentations must be prepared for different target 
machines, and the front end must be renewed 
whenever a compiler is prepared. 

(b) Relating instructions and intermediate represen- 
tations to each other in a one-to-one relationship 
in the front end increases the number of the inter- 
mediate representations, to thereby complicate 
optimization and elongate an optimization time. 
For example, if a target machine is designed for 
32-bit instructions and if it is impossible to load 
or store a 32-bit address at one time, two instruc- 
tions are combined to substitute for a load/store 
instruction to address 32 bits. If the front end 
relates the two instructions to different interme- 
diate representations, it is necessary to recognize 
during optimization that the two specific instruc- 
tions serve as the single load/store instruction. 
This is not as simple as recognizing a single load 
or store instruction and complicates the optimi- 
zation. 

(3) Each optimization function is applied only to a 
predetermined intermediate representation. This 
puts limits on the front end and on optimization. 

The intermediate representation changing techniques 
disclosed in the references 3 and 4 completely change 



12/02/2003, EAST Version: 1.4.1 



SUMMARY OF THE INVENTION 



5,396,631 

3 4 

the physical data structures of intermediate representa- and changing processes a predetermined number of 
tions. Accordingly, these techniques hardly standardize times. 

optimization functions. The optimizing unit and intermediate representation 

changing unit may have a variable identifying a present 
5 phase when repeating the same optimization. 
An object of the present invention is to provide a The compiling apparatus may have a selection switch 
compiling apparatus and a compiling method that meet to determine whether or not an optional one of the 
a variety of requirements for optimization. A compiler optimization functions must be achieved in an optional 
according to the present invention provides a high-per- one of the optimizing phases. 

formance object code according to a target architec- 10 The selection switch may be externally controlled to 
ture, a source program, and optimization requirements. stop an optional optimization function in an optional 
The present invention standardizes intermediate repre- phase. 

sentation changing processes and optimizing processes, The compiling apparatus may have a printing unit for 
to provide an efficient compiler. printing intermediate representations in each phase and 

In order to accomplish the objects, the present inven- 15 a selection switch for selecting the printing unit* 
tion provides a compiling apparatus having a front end The selection switch may be externally controlled to 
for providing intermediate representations according to print intermediate representations at optional timing in 
a source program, an optimizing unit for optimizing the any one of the phases. 

intermediate representations, at least one intermediate ™„„ « „ „ „,™„ 

representation changing unit for changing the opti- 20 BRIEF DESCRIPTION OF THE DRAWINGS 

mized intermediate representations, and a code output The present invention will be more clearly under- 
unit for providing codes according to the last obtained stood from the description as set forth below with refer- 
intermediate representations. The compiling apparatus ence to the accompanying drawings, in which: 
is characterized by an optimizing structure determina- FIG. 1 explains a conventional optimizing compiler; 
tion unit for detenriining the number of repetitions of an FIG. 2 shows a principle of the present invention; 
optimization phase achieved by the optimizing unit and FIG. 3 explains an optimizing structure determina- 
intermediate representation changing unit and selecting tion flow; 

optimization functions carried out in each of the optimi- FIG. 4{A) shows examples of intermediate represen- 

zation phases. According to the determination by the 3Q tations at an input end of optimization; 

optimizing structure determination unit, the intermedi- FIG. 4(B) shows examples of intermediate represen- 

ate representations are changed and optimized in each tations at an output end of the optimization; 

of the optimization phases. FIG. 5 shows optimization functions and their 

The optimizing structure determination unit deter- classes; 
mines the number of the intermediate representation 35 FIG. 6(A) shows examples of optimization data 
changing and optimizing processes according to the stored in a table; 

program-dependent intermediate representations pro- FIG. 6(B) shows other examples of optimization data 
vided by the front end, object-architecture-dependent stored in a table; 

intermediate representations, and the selected optimiza- FIG. 7(A) shows a data structure in a table used for 
tion functions. 40 changing intermediate representations; 

The physical memory map structure of main data of FIG. 7(B) shows a data structure in a table used for 
the intermediate representations during the intermedi- optimization; 

ate representation changing and optimizing processes is FIG. 8 shows a compiler according to an embodi- 
fixed. ment of the present invention; 

The fixed physical memory map structure helps stan- 45 FIG. 9 shows an example of a program used for ex- 
dardize the intermediate representation changing and plaining the operation of the compiler according to the 
optimizing processes. embodiment of the present invention; 

The optimizing structure determination unit may FIG. 10 shows outputs of a front end with respect to 
select a plurality of optimizing structures, separately the program, according to the embodiment of the pres- 
carry out optimization according to each of the selected 50 ent invention; 

structures, and select the most proper one among the FIG. 11 shows outputs after optimization A accord- 
optimizing structures according to the optimization ing to the embodiment; 

results. FIG. 12 shows intermediate representations after a 

The optimizing unit and intermediate representation change (IE1) according to the embodiment; 
changing unit may repeat predetermined optimization. 55 FIG. 13 shows other intermediate representations 

The compiling apparatus may have a switch that after the change (IE1); 
connects and disconnects the optimizing structure de- FIG. 14 shows outputs after optimization B accord- 
termination unit in front of the optimizing unit and ing to the embodiment; 

intermediate representation changing unit. FIG. 15 shows other outputs after the optimization B; 

When the switch is turned ON, the optimizing struc- 60 and 
ture determination unit determines an optimizing struc- FIG. 16 explains the data structure of an intermediate 
ture in each compiling phase. According to the optimiz- representation. 

ing structure, intermediate representations correspond- ncoronyrinxT r^-c tut ___ T) _ t? _. 

ing to a source program are changed, and optimization DESCRIPTION OF THE PREFERRED 

function to be applied are selected. 65 EMBODIMENTS 

When the switch is turned OFF, the optimizing unit Before describing the preferred embodiments accord- 
and intermediate representation changing unit repeat ing to the present invention, an example of the related 
predetermined intermediate representation optimizing art is provided. 
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FIG. 1 explains optimization according to the prior also obtains optimization data such as required optimi- 
art. A front end 100 provides intermediate representa- zation functions from the optimization data file 11. 
tions according to a source program, so that the source Then, the unit 4 determines the number of intermediate 
program can be compiled. An optimizing section 101 representation optimizing and changing processes. For 
optimizes the intermediate representations provided by 5 example, for a simple optimizing structure, the unit 4 
the front end 100, to improve the speed of the source may determine one time of optimizing and changing 
program and reduce the size of the source program. A source-program-dependent intermediate representa- 
result of the optimization is supplied to a code output tions, and one time of optimizing and changing target- 
section 102, which provides code in a machine-depend- machine-dependent intermediate representations, 
ent program language such as a machine language. 10 Once the optimizing structure determination unit 4 

The intermediate representations optimized by the selects the optimization functions to achieve and the 

optimizing section 101 are independent of languages number of intermediate representation optimizing and 

and machines. changing processes, these results are informed to the 

FIG. 2 shows a principle of the present invention. optimizing unit 5, intermediate representation changing 

In the figure, numeral 1 is a source program, 2 is a 15 unit 6, and determination unit 7. At first, the optimizing 
front end for providing intermediate representations unit 5 optimizes the source-program-dependent inter- 
corresponding to the source program 1, 3 is a switch mediate representations according to the intermediate 
that is usually turned ON to determine an optimizing representation structure data 12 provided by the front 
structure, 4 is an optimizing structure determination end 2. Thereafter, the intermediate representation 
unit, 5 is an optimizing unit, 6 is an intermediate repre- 20 changing unit 6 changes a part or the whole of the 
sentation changing unit, 7 is a determination unit, 8 is a optimized intermediate representations, to provide the 
code output unit, and 9 is an object program. The opti- intermediate representation structure data 13. The de- 
mizing structure determination unit 4 may be arranged termination unit 7 repeats a loop of the intermediate 
inside or outside a compiler. Determined results of the representation optimizing and changing processes until 
unit 4 are provided to the optimizing unit 5 and interme- 25 the number deterrnined by the optimizing structure 
diate representation changing unit 6. determination unit 4 is counted. In each loop, the opti- 

An intermediate representation data file 10 stores mizing unit 5 carries out the specified optimization func- 
intermediate representation data used for providing first tions, and the intermediate representation changing unit 
intermediate representations from the front end 2, or for 6 changes the optimized intermediate representations, 
providing an object program. An optimization data file 30 When the determination unit 7 counts the predeter- 
11 stores optimization data used for determining an mined number, the code output unit 8 provides codes to 
optimizing structure. The file 11 also stores determined form the object program 9 in machine language accord- 
optimization data. Intermediate representation structure ing to the last provided intermediate representation 
data 12 is provided by the front end 2. Intermediate structure data 13. 

representation structure data 13 is provided by the in- 35 Each of the optimizing unit 5 and intermediate repre- 
termediate representation changing unit 6. sentation changing unit 6 may have a function of deter- 
According to the source program 1 provided to the mining whether or not the predetermined number of the 
front end 2 and according to the target machine, the intermediate representation optimizing and changing 
optimizing structure determination unit 4 selects optimi- processes have been completed. In this case, the deter- 
zation functions to achieve and determine the number of 40 mination unit 7 may be omitted, 
repetitions of an intermediate representation changing The optimizing structure determination unit 4 is able 
process. According to the determination, a loop opera- to repeatedly change and optimize intermediate repre- 
tion of the optimizing unit 5 and intermediate represen- sentations, so that optimization functions that have not 
tation changing unit 6 is repeated. The same optimiza- been applied to first intermediate representations may 
tion may be repeated several times, to provide a high- 45 be applied to secondly changed intermediate represen- 
performance object program corresponding to the tations. Instruction scheduling may be carried out and 
source program and target machine. optimized in a stage where machine instructions and 
In FIG. 2, the front end 2 changes the source pro- intermediate representations have a one-to-one relation- 
gram 1 into the intermediate representations. The ship. This results in improving the optimizing perfor- 
switch 3 determines whether or not the optimizing 50 mance of the compiler. 

structure determination unit 4 is used. The switch 3 may The optimizing unit 5 may have an intermediate rep- 
be optionally turned ON and OFF according to re- resentation changing function, to omit the intermediate 
quired optimization levels. representation changing unit 6. 

When the switch 3 is OFF, the optimizing unit 5 and FIG. 3 shows the flow of steps for determining an 

intermediate representation changing unit 6 repeat a 55 optimizing structure and related data, 

predetermined optimizing phase involving optimization Data 20 corresponds to the data 10 of FIG. 1 and 

and intermediate representation change a predeter- indicates the characteristics of source-program-depend- 

mi neri number of times. The optimizing phase is pre- ent intermediate representations provided by the front 

pared according to the source program 1. When the end. Data 21 indicates the characteristics of target- 
determination unit 7 counts the predetermined number 60 machine-dependent intermediate representations sup- 

of times, the code output unit 8 prepares codes accord- plied to the code output unit that provides object codes 

ing to the last optimized and changed intermediate rep- after optimization. Optimization data 22 corresponds to 

resentations. These codes form the object program 9. the data 11 of FIG. 1 and is used to determine optimiza- 

When the switch 3 is ON, the optimizing structure tion functions and the number of repetitions of interme- 
determination unit 4 arranged inside or outside the com- 65 diate representation optimizing and changing processes, 

piler obtains preliminary supplied front end intermedi- In FIG. 3, step SI picks up optimization functions to 

ate representation data and target machine data from be held in a compiler. The optimization data 22 includes 

the intermediate representation data file 10. The unit 4 a list of optimization functions corresponding to the 
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front-end intermediate representation data 20 and code- FIGS. 4(A) and 4(B) show examples of intermediate 

providing intermediate representation data 21, and the representations at I/O sections of a compiler, in which 

step SI selects some of the optimization functions from FIG. 4(A) shows intermediate representations provided 

the data 22, so that they are executed by the optimizing by a front end, and FIG. 4(B) shows intermediate repre- 

compiler. This selection may be made by a person who 5 sentations for providing output codes. The intermediate 

prepares the compiler. representations of FIG. 4(A) are dependent on a source 

Step S2 classifies the selected optimization functions program, and those of FIG. 4(B) are dependent on a 

into execution classes. At this time, the optimization target architecture. These intermediate representations 

functions are classified into those that are achieved on correspond to the data 20 and 21 of FIG. 3 used to 

the front-end intermediate representations, those that 10 determine an optimizing structure, 

are achieved on the code-providing intermediate repre- FIG. 4(A) shows the attributes of each intermediate 

sentations, those that are achieved between the above representation corresponding to a code (an operator), 

two cases, and those that are achieved in instruction The attributes include the type, code, data types, oper- 

scheduling. The optimization functions are classified ands, and conditions of the intermediate representation, 

according to: 15 Taking the first intermediate representation of Fig. 

(1) the physical memory map structures of data of 4(A) as an example, the type is a generic LINDA, the 
intermediate representations to be optimized, and code is an operator LOAD; the data types are II to 14 

(2) objects to be optimized. representing signed data of one to four bytes, ul to u4 
The item (2) means that a given optimization function representing unsigned data of one to four bytes, r8 to 

is classified according to whether the function is carried 20 rl6 representing floating-point data of 8 to 16 bytes, and 

out on a language or program structure such as a loop c8 to c32 representing fixed-point data of 8 to 32 bytes, 

and a conditional statement, on a target structure such the operands are one or two, and the conditions are a 

as a parallel structure and a register structure, or on boundary (an unguaranteed boundary X) and an address 

both of them. The classified result is stored in the opti- 25 representation of three values. Similarly, intermediate 

mization data 22. representations for codes (operators) ADD and cmove 

Step S3 determines a compiler structure in two steps are shown in FIG. 4(A). 
S30 and S31. The step S30 determines the number of FIG. 4(B) shows the intermediate representations for 
times of changing intermediate representations. In prin- providing codes that are dependent on the target archi- 
ciple, this number corresponds to the number of classes ^ tecture. Although the intermediate representations of 
of the classified optimization functions. The number FIG. 4(B) resemble those of FIG. 4(A), the contents of 
may also be determined according to the types of inter- them do not always agree with each other. For exam- 
mediate representations and the intermediate represen- pie, the data types of the code LOAD of FIG. 4(B) do 
tation dependencies of the optimization functions. not include rl6 and c8 to c32, and the intermediate 

The step S31 determines the optimization functions in 35 representation for the code "cmove" is not included in 

three steps S310, S311, and S312. The step S310 sorts FIG. 4(B). 

the optimization functions on execution phases with FIG. 5 shows optimization functions to be achieved 

reference to the types of intermediate representations by the compiler. These functions are picked up in the 

and the intermediate representation dependencies. The step SI of FIG. 3. In FIG. 5, each of the optimization 

step S311 extracts applicable ones among the optimiza- 40 functions numbered from 1 to 22 has a title. For exam- 

tion functions according to the application conditions of pie, the optimization function No. 1 is titled "CON- 

the functions. The step S312 determines the execution STANT FOLDING" to fold constants. This function 

order of the optimization functions according to the folds, for example, "1+3" as "4." The optimization 

front-end and code-providing dependencies of the tunc- function No. 2 is titled "CONSTANT PROPAGA- 

tions. 45 TION" to propagate constants. Reference marks MPA, 

Once the optimizing structure is determined, the in- MPB, and SCH shown in FIG. 5 will be explained later, 

termediate representation optimizing and changing pro- These data are in the. form of a table or a database and 

cesses are carried out according to the determined re- are stored as the optimization data 22 of FIG. 3. 

suits. Namely, the intermediate representations are FIGS. 6(A) and 6(B) show examples of the optimiza- 

changed and optimized phase by phase, to provide 50 tion functions stored in the table. The figures show only 

codes that form an optimized object program. data for entry numbers 1, 4, 14, 16, 18, and 22. These 

The processes of FIG. 3 for detennining an optimiz* entry numbers correspond to the optimization function 

ing structure will be explained in detail with reference numbers shown in FIG. 5. Taking the entry number 1 as 

to FIGS. 4(A) to 7(B). an example, the title is "constant folding", the type of 

This explanation relates to a compiler for a parallel 55 applicable intermediate representation is "LINDA", an 
RISC (reduced instruction set computer). 10 According "" operator is (don't care) to mean that this optimiza- 
to the RISC architecture, a memory is accessed only tion function is irrelevant to operators, an operand is 
with a load/store instruction, and operations are always "cnt" (a constant), and intermediate representation de- 
carried out between registers, to improve hardware pendencies are shown. 

speed. The parallel RISC architecture has a function of 60 The entry number 4 has the title "CSE" (common 

simultaneously executing a plurality of instructions. subexpression elimination) and, as an intermediate rep- 

Although the following explanation relates to the com- resentation dependency, a boundary of X to indicate 

piler for the parallel RISC architecture, the present unguaranteed In this way, each of the optimization 

invention is applicable for any other processors. In the functions corresponding to the entry numbers 14, 16, 18, 

following explanation, a front end is prepared for a 65 and 22 has data shown in FIGS. 6(A) and 6(B). 

language, with no regard to targets. This provision is According to the data shown in FIGS. 4(A), 4(B), 

right because the specifications of a language are un- 6(A), and 6(B), the optimizing structure deterrnining 

changed for different targets. flow of FIG. 3 is carried out. 
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If the intermediate representations provided by the 

front end and the intermediate representations for pro- boundary: bxd, axd address i 

viding codes completely agree with each other, it is not (referred to as phase 3) 

necessary to change the intermediate representations. If boundary: x address 2 

they do not agree with each other as shown in FIGS. (referred to as phase 2) 
4(A) and 4(B), the optimizing structure detenmning 

flow of FIG. 3 is carried out. In addition, there is the following output from the front 

Namely, the step SI of FIG. 3 picks up optimization end: 
functions as shown in FIG. 5. The step S2 of KEG. 3 io 

classifies the optimization functions according to the boundary: x address 3 

conditions (1) and (2) explained above. Since this em- (referred to as phase l) 
bodiment employs the same physical memory map 

structure for data of each intermediate representation, it Accordingly, the intermediate representations must be 

, *i. . * • \ : xl 15 changed three tunes, 

is not necessary to classify the optimization functions — « & „ c -, rT ? T ~ -» • j 

/ . ' * * . The step S31 of FIG. 3 sets the tuning and positions 

according to the condition (I). Accordmg to the condi- tQ om ^ optimkation functions . 

tion (2), the language- or program structure-dependent According to the types of the intermediate represen- 

ophmization functions are achieved on a source pro- totions ^ the intermediate representation dependen- 

gram structure or on variables, and the architecture- 20 cies of the optimization functions shown in Figs. 6(A) 

dependent optimization functions are achieved on the and 6(B), the step S310 of FIG. 3 sorts the optimization 

specifications of a target machine architecture. functions on the phases 1, 2, and 3. The entry numbers 

Memory addressing instructions such as a register (function numbers) 1, 4, 14, and so on of FIGS, 6(A) and 

allocation instruction and an instruction scheduling 6(B) are sorted as follows: 

instruction are dependent on architecture. Accordingly, 25 Phase 1: 1, 4, 14, 16 

optimization functions carried out on these instructions pj^e 3- 22 ^ ^ ^ 

are also the architecture-dependent optimization func- ~, . ' - - - . 

, r r _ , The step 311 of FIG. 3 extracts proper ones among 

tions. Accordmg to the embodiment of the present in- these optimization factions according to application 

vention, the architecture-dependent optimization func- 39 conditions as follows: 

tions are classified into two as follows: Phase 1: 1, 4, 14, 16 

(1) optimization functions that manipulate instruc- Phase 2: 1, 4, 18 
tions themselves, e.g., reducing the number of in- Phase 3: 22 

structions and changing an instruction into another The step 312 of FIG. 3 determines the execution 

of higher speed (for example, changing a multipli- 35 o fder of the optimization functions according to the 

cation instruction into a repetition of addition) dependencies as follows: 

(2) optimization functions related to the order of V Q 16 

instructions ^ ' ^ 

_ . . . „ . ^ „ m Phase 3: 22 where (1, 4) means either of 1 or 4 may be 

The optimization functions 1 to 22 of FIG. 5 are ^ carried QUt at first 

classified according to these criteria into classes MPA, ^ ^ the optimizing structure is determined. 

MPB, and SCH. Here, the class MPA includes lan- According to the determination, the intermediate repre- 

guage-dependent (source-program-dependent) optimi- sentations are automatically optimized and changed. 

zation functions to be carried out on language-depend- For automatization, the following must be prepared: 

ent intermediate representations, the class MPB in- 45 1) A data structure each element of which corre- 

cludes architecture-dependent (target-machine-depend- sponds to an item of a table 

ent) optimization functions to be carried out on archi- 2 ) Data encoded into a string of characters or num- 

tecture-dependent intermediate representations, and the be held 33 ^ mternal structure of the com " 

class SCH includes optimization functions related to the „ _P I /ANU , , , , , , . ^ 

„ . . , , . 50 FIG. 7(A) shows a table used to change intermediate 

execution order of instructions among the architecture- representations> md mQm 7(B) shows a ^ used for 

dependent optimization functions. carrying out an optimization function. In the figures, 
In FIG. 5, any optimization function with a plurality IML is m intermediate representation. The table of 
of marks 0 is repeatedly carried out on different kinds of fig. 7(A) for a given intermediate representation con- 
intermediate representations. 55 tains the type of the intermediate representation (IML- 
After the classification, the step S3 of FIG. 3 deter- TYPE), an operation code (CODE-TYPE), a list of 
mines a compiling structure. The step S30 determines data types, a list of operands, etc. The table of FIG. 7(B) 
the number of times of changing intermediate represen- f° r a given optimization function contains an optimiza- 
tations according to the types of intermediate represen- tion Action number, the name of the optimization 
tations and the number of intermediate representation 60 Action, the type of an intermediate representation on 
j j • * • xrr^o */-a\ a *ro\ which the optimization function is earned out, a list of 
dependencies shown m FIGS. 6(A) and 6(B). v , *\ . A ' . . 
; t . « , . * apphed operators, etc. These data are stored in the 
In this embodiment, the intermediate representations ^ia file 11 of FIG. 2. 
are of only one type, LINDA. The number of interme- ^ embodiment of the present invention will be ex- 
diate representation dependencies is two for the 65 plained with reference to FIGS. 8 to 16. 
"boundary" excluding those with a mark (don't FIG. 8 shows the structure of a compiler according 
care). The two intermediate representation dependen- to the embodiment. This structure is provided by the 
cies are as follows: optimizing structure determining flow of FIG. 3. 
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The optimizing structure of a compiler for a program tions of the respective optimizing structures are 

for the parallel RISC can be obtained as explained with stored. 

reference to FIGS. 4(A) and 4(B) to 7(A) and 7(B). The An execution time is estimated for each of the listed 

structure of FIG. 8 is obtainable in the same manner. elements, i.e., for a string of the intermediate representa- 

The structure of FIG. 8 carries out optimization three 5 tions of each of the optimizing structures, and object 

times in phases 1 to 3, which correspond to the classes codes are prepared according to the intermediate repre- 

MPA, MPB, and SCH of FIG. 5. Intermediate repre- sentations that provide the shortest execution time. The 

sentations are changed three times, accordingly. execution time is calculable because instructions corre- 

Among the three times of changing, the first one is sponding to the intermediate representations are 

carried out by a front end to change a source program 10 known, and the execution time of each of these instruc- 

into first intermediate representations. tions k known because the execution time is architec- 

In FIG. 8, the front end 70 provides the first intenne- 111131 data required for instruction scheduling. If there 
diate representations according to the source program. ^ loo P s and number of loops is statically known, 
Optimization A (71) optimizes the first intermediate ^ execution time of instructions will be multiplied by 
representations provided by the front end 70. The opti- 15 ^ number of loops. If the number of loops is unknown, 
mization A involves the optimization functions classi- ^ execution time may be multiplied by the same con- 
ned into the class MP A (phase 1) of FIG. 5. Interface stant ihis case ' no ^ error wil1 because 
expansion IE1 (72) changes the optimized intermediate evef y code K ^ 88016 condition, 
representations, to provide intermediate representations . ^ flow of FlG \ 1 3 <** * incorporated as a compil- 
of target instruction level. Optimization B (73) opti- 20 mg phase m a cx>mpder, to select optmiizafton funcUons 
mizes these intermediate representations. according to a source program. 

The optimization B involves the optimization func- * d * mon f. functloIls of embodiment of FIG. 8 

tions classified into the class MPB (phase 2) of FIG. 5. ^ be ex P lainea ; . 

Interface expansion IE2 (74) changes the optimized „ According to the embodiment of FIG. 8, mtermed.- 

intermediate representations, to provide intermediate 25 ate representations provided by the front end are re- 

r^^ XtS 

^ . ^ . ^ ~ c A , t c^tt / T_ <>\ whether or not the intermediate representation optimiz- 
the optaruzatton function 22 of the class SCH (phase 3) J0 m ^ chan ^ of a ^ orresponding P phase 

of FIG. 5. According to a result of tins optimization must be ^ ^ switches ^ 

function a code output sectoon 76 provides codes. from QUtside of ^ ^ t0 st the pTOcesses ofaa 

Accorduig to this embodiment, the front end pro- jional hase ^ ^ provided ^ ^ 

vides the first intermediate representations Generally, option of ^ compiler . If ^ option stands> ^ „„. 
these intermediate representations provided by the 35 sponding phase ^ passed (re turned). This reduces a 
front end are irrelevant to a target If not so, different translation ^ md unloves the efficiency of debug- 
intermediate representations must be prepared for dif- g^g when preparing a compiler, 
ferent targets, and therefore, a new front, end must be ^ arrangement of FIG. 8 may have a function of 
prepared whenever a compiler is formed. printing intermediate representations. Namely, a selec- 

Wnen adding a new optimization function to the w tion switch may be arranged to determine whether or 
structure of FIG. 8, or when changing the specifications not intermediate representations in each phase are 
of optimization, the steps SI to S3 of FIG. 3 may be printed. This selection switch is different from the selec- 
repeated to determine a compiler structure. If the num- ti on switch for selecting whether or not the optimizing 
ber of classes of optimization functions is increased, a and changing processes of a given phase are carried out. 
step of changing intermediate representations will be 45 The printing function of intermediate representations 
added, to form a compiler structure that provides a may be called by controlling an option during the de- 
high-perfonnance object. bugging of a program or the processing of the compiler, 

The optimizing structure determination flow of FIG. to print intermediate representations at an optional loca- 
3 may be carried out by externally selecting and setting tion of an optional phase, to improve debugging effi- 
required data. If the specifications of optimization func- 50 ciency. 

tions are registered in tables as shown in FIGS. 7(A) FIGS. 9 to 16 show examples of intermediate repre- 
and 7(B), proper ones of the functions may 10 be auto- sentations at each optimization phase with respect to a 
matically picked up. In this case, the optimizing struc- program, according to the compiling structure of FIG. 
ture determination may be omitted. Namely, this case 8. 

corresponds to tiirning OFF the switch 3 of FIG. 2. 55 FIG. 9 shows a program for explaining the operation 
It is possible to select a plurality of optiinizing struc- of the compiler according to the embodiment, FIG. 10 
tures and separately operate them. According to the shows outputs of the front end with respect to the pro- 
results of the operations, the best one of the optimizing gram, FIG. 11 shows outputs after the optimization A, 
structures will be selected to provide a high-perfor- FIGS. 12 and 13 are intermediate representations after 
mance object 60 the intermediate representation change IE1, and FIGS. 

When operating the plurality of optimizing struc- 14 and 15 show outputs after the optimization B. 
tures, the following conditions are applied: Essential ones of the marks shown in FIGS. 10 to 16 

(1) If there are a plurality of processors, they will be will be explained. 

allocated for the optimizing structures, respec- #label-n (0<=n) is a base block. Intermediate texts 
tively. 65 forming the block follow the label. Each of the interme- 

(2) If there is one processor, the optirnizing structures diate texts includes an operator and a string of operands, 
are operated one after another before providing The operator is separated from the operands by a space, 
codes. Lists of resultant intermediate representa- The operands are as follows: 
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var: a variable FIGS. 14 and 15 show results of the optimization B. 

std, prg: a temporary variable produced by the com- Comparing the intermediate representations after the 

piler interface expansion IE1 of FIGS. 12 and 13 with the 

axd, bxd: an array, structure data results of the optimization B of FIGS. 14 and 15, it will 

cnt: a constant 5 be understood that basic optimization is achieved after 

The front end provides intermediate representations the interface expansion IE1. 

shown in FIG. 10 with respect to the source program of According to the embodiment of FIG. 8, the func- 

FIG. 9. In FIG. 10, the intermediate representations and tions of the optimization A and those of the optimiza- 

instructions do not always correspond to each other in tion B are standardized. According to another embodi- 

a one-to-one relationship. This will result in insufficient 10 ment, the optimization A and optimization B may be 

optimization. achieved with separate functions. 

With respect to these outputs of the front end, the When changing the intermediate representations by 

optimization A (71 of FIG. 8) is carried out. Namely, the phase IE1 (72 of FIG. 8) of the embodiment of FIG. 

the optimization functions classified into the class MPA $ t the instructions and intermediate representations are 

of FIG. 5 are achieved on the front-end outputs. FIG. 15 not completely in a one-to-one relationship. This is 

11 shows results of the optimization A. Comparing because processes after the phase IE1 are classified into 

FIGS. 10 and 11 with each other, the effect of the opti- the following two groups according to functions and 

mization is apparent. Namely, a variable XX and array data to use: 

elements dd(i) are collectively handled. A structure (i) fh e optimization B and register assignment 

assignment is achieved with a single intermediate repre- (2) Instruction scheduling 

sentation. The item (1) above relates to providing instructions as 

The intermediate representations provided by the follows: 
optimization A of FIG. 11 are changed by the interface ( a ) Deleting dead instructions 
expansion IE1 (72 of FIG. 8) into intermediate represen- %5 (b) Collecting overlapping instructions 
tations shown in FIGS. 12 and 13. The interface expan- ( c ) Providing speedier instructions 
sion IE1 changes the intermediate representations pro- The i tem (2) above does not relate to providing in- 
vided by the optimization A according to target instruc- structions but improves hardware efficiency by prop- 
tion sets and the optimization functions included in the er j y se tting the execution order of instructions. Accord- 
next optimization B (73 of FIG. 8). An object of the 3Q it ^ pre f e rable not to put the instructions and 
interface expansion IE1 is to provide clear intermediate intermediate representations in a one-to-one relation- 
representations as an object of the optimization B. snip j ust after ^ intermediate representation changing 

As shown in FIGS. 12 and 13, the XX and dd(i) have of tne pnase 
increased the number of intermediate representations In the mstruct i on scheduling, a double-precision 
and changed the operands according to addressing. A 35 load/store instruction must be divided into two single- 
structure assignment statement has been changed into a precision instructions. The reason is because these two 
loop structure. The intermediate representation change mstruct ions can be executed in parallel with each other, 
by the interface expansion IE1 is achievable according Name l y , these instructions will be more quickly corn- 
to branch tables because an output is uniquely deter- leted if ^ m s tel executed ^ tf they m 
mmed once the code and operands of a corresponding 40 ntiall executed. In terms of register assignment 
mput mtermediate representation are determined. Other ^ 0 ^ mizatioilf howeV er, the two single-precision 
techniques are also employable instructions originally indicate a single entity. Accord- 
f??T °l T Tn^r ingly, if they « separately handled, the operation of 
mi of FIGS. 12 and 13 the optimization B (73 of F G. ^ ^er during the optirnization A will be compli- 

8) is earned out Obiects of the optimization B are tar- , , , ,. . . , * 

o;ibtdmcu uuu ^jecu, „i uic upuimcauuu . «c 45 d fa addltl0 n, in terms of the register assignment, 

get-dependent parts that do not directly appear in the ^ ' * u « * ^ r Tu ,~ 

_* *i. «. i. ~* 5«„i,„^ :« consecutive registers must be allocated for the two 

source program, or parts that have not been analyzed in . ^_ * . . . ... 

the front end. As shown in FIG. 5, these parts overlap 8 ? >Mte ^mictions, so that information uidicatmg that 

those for the optimization A. According to this embodi- * e * wo lnstructlons are continuous must be provided 

ment, the following items are also objects of the optuni- 50 for *\ register assignment. 

zation B* On the other hand, the intermediate representation 

1. Optimization of address calculations changing phase IE2 (74 of FIG. 8) relates to the instruc- 

2. Loop processes of structure assignment tion scheduling, so that the intermediate representations 
According to this embodiment, the functions of the md actual instructions are decomposed to form one-to- 
optimization B are basically the same as those of the 5 5 one relationship. This is achieved as in the case of the 
optimization A. Accordingly, the optimization B is intermediate representation changing phase IE1 (72 of 
achieved by calling the same functions as those for the FI ^- 

optimization A as much as possible. For this purpose, According to the embodiment, the instruction sched- 

the types of the intermediate representations must be ulin g is carried out after the register assignment, ac- 

considered. The repetition of the optimization will be 60 cording to a known technique. Instead of carrying out 

explained later. the instruction scheduling after the register assignment, 

The prior art does not carry out the optimization B of the following approaches 1 to 4 may be employable: 

the present invention. Namely, the prior art does not 1- Carrying out the instruction scheduling before the 

achieve the optimization functions included in the opti- register assignment 

mization B, so that, if the RISC architecture is a target, 65 2. Simultaneously carrying out the instruction sched- 

a high-performance object will not be provided. Ac- uling and register assignment 

cordingly, the interface expansion IE1 (72 of FIG. 8) 3. Repeating the register assignment and instruction 

and optimization B are indispensable processes. scheduling in optional order 
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4. Repeating the register assignment and instruction 
scheduling in optional order and carrying out opti- 
mization during the repetition. 
In each of these cases, instructions and intermediate 
representations are related to each other in a one-to-one 5 
relationship before the instruction scheduling, to deter- 
mine how to change the intermediate representations, 
i.e., to determine the timing of the instruction schedul- 
ing. 

The types of intermediate representations will be 1° 
explained. According to the embodiment, each interme- 
diate representation has a data structure shown in FIG. 
16. To make an actual program according to the struc- 
ture of FIG. 16, the following structure declaration is 
made, so that the number of operands becomes variable 
depending on an operator. 

struct OperandType operand[l]; 

In this way, the physical memory map structure of 
intermediate representations can be fixed between pha- 
ses of a compiler. 

Changing intermediate representations will be ex- 
plained. In the following explanation, the "instructional 
intermediate representation" means an intermediate 
representation corresponding to an instruction. For 
example, #label-2 of FIG. 10 contains the following 
parts: 

move var:("l") cnt: 0 

bqe #label-o var:("l") var:("n") Here, "move 
var:("l") cnt:0" is an instructional intermediate 
representation. The "move" is an operator or an 
instructional intermediate representation code, the 
"var;("l")" is an operand 1 (first operand), and 
"cnt:0" is an operand 2 (second operand). Chang- 
ing an intermediate representation means: 

(1) Changing an operand pointed to by the intermedi- 
ate representation. In FIGS. 9 to 15, examples of 
this case will be seen as changes in operands of 
instructional intermediate representations. This is 
carried out by shifting the pointer. In FIG. 10, 

. changes in operands of an instructional intermedin 

ate representation are as follows: 
"move var:("l") cont:0" of #label-2 in the front end 

output of FIG. 10 is changed to "move 

prg:#17cddc("l") cnt:0" of #iabel-2 in the output 45 

of the optimization A of FIG. 11. 

(2) Changing the operator of the intermediate repre- 
sentation 

(3) Changing the output method of the intermediate 
representation. This means to provide a plurality of 50 
instructional intermediate representations or a loop 
structure according to an instructional intermedi- 
ate representation. For example, "cmove" in 
FIGS. 10 and 11 is changed into a loop of #label-14 
after the phase IE1 as shown in FIG. 12 and after 55 
the optimization B as shown in FIG. 14. 

In this way, the meanings of the intermediate repre- 
sentations are changeable without changing their data 
structures. 

Repetition of optimization will now be explained. 

This embodiment standardizes intermediate represen 
tations to be optimized, to carry out uniform optimiza- 
tion in each optimization phase. A variable indicating 
the presently processed phase is used, and individual 
optimizing routines are standardized. 

The optimization can be repeatedly called in the same 
phase. The repetition of the optimization is achievable 
in the following methods: 
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(1) A person who makes a compiler determines the 
repetition. 

(2) A flag indicating whether or not intermediate 
representations have been changed is prepared. 
This flag is initially OFF. If intermediate represen- 
tations are changed during optimization, the flag is 
turned ON. When the flag is ON, the flag is initial- 
ized, and the optimization is repeated. 

(3) Optimization is repeated the predetermined num- 
ber of times, which may be determined by a person 
who prepares a compiler, or may be optionally set 
from the outside. 

(4) The methods (1) to (3) are combined. 

This embodiment employs the method (1). Any one 
of the repetition methods (1) to (4) may be selected by 
a user according to an optimization level. 

As explained above, the present invention provides 
an optimizing compiler proper for a given architecture. 
The optimizing compiler determines a compiler struc- 
ture, to optimize a source program. The present inven- 
tion carries out optimization at a proper position ac- 
cording to the characteristics of the source program and 
architecture, to provide a high-performance object. 

The present invention is capable of repeating optimi- 
zation through compiling phases, to provide a high-per- 
formance object. 

The present invention carries out the same optimizing 
and intermediate representation changing processes in 
respective sections of a compiler, to thereby improve 
the efficiency of the compiler. 

What is claimed is: 

1. A compiling apparatus in a data processing appara- 
tus, comprising: 

a front end providing intermediate representations 
described by an intermediate language according 
to a source program; 

optimizing means for optimizing the intermediate 
representations from the front end in accordance 
with predetermined optimization functions; 

at least one intermediate representation changing 
means for further transforming the optimized inter- 
mediate representations from the optimizing means 
in accordance with predetermined optimization 
functions; 

code output means for providing code according to 
the intermediate representations finally obtained by 
the optimization from the intermediate representa- 
tion changing means; and 

optimizing structure determination means for deter- 
mining a number of repetitions of an optimization 
phase achieved by the optimizing means and inter- 
mediate representation changing means and select- 
ing the optimization functions for directing an opti- 
mizing process carried out in each of the optimiza- 
tion phases, and thereby the intermediate represen- 
tations being changed and optimized. 

2. The compiling apparatus according to claim 1, 
wherein the optimizing structure determination means 
determines the number of times of changing the inter- 
mediate representations and selects optimization func- 
tions to be achieved in each phase according to the 
intermediate representations dependent on the source 
program provided by the front end, intermediate repre- 
sentations dependent on an object architecture, and the 
applied optimization functions. 

3. The compiling apparatus according to claim 2, 
further comprising a switch that connects and discon- 
nects the optimizing structure determination means in 
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front of the optimizing means and intermediate repre- 
sentation changing means, 
the switch being turned ON to determine an optimiz- 
ing structure in a compiling phase, change interme- 
diate representations according to the source pro- 5 
gram, and change optimization functions to be 
applied. 

4. The compiling apparatus according to claim 3, 
wherein, when the switch is turned OFF, the optimiz- 
ing means and intermediate representation changing 10 
means repeat predetermined intermediate representa- 
tion optimizing and changing processes a predeter- 
mined number of times. 

5. The compiling apparatus according to claim 4, 
wherein the optimizing means and intermediate repre- 15 
sentation changing means have a variable identifying a 
present phase when repeating the same optimization. 

6. The compiling apparatus according to claim 4, 
further comprising a selection switch to determine 
whether or not an optional one of the optimization 20 
functions must be achieved in an optional one of the 
optimizing phases, the selection switch being externally 
controlled to stop an optional optimization function in 
an optional phase. 

7. The compiling apparatus according to claim 3, 25 
further comprising a selection switch to determine 
whether or not an optional one of the optimization 
functions must be achieved in an optional one of the 
optimizing phases, the selection switch being externally 
controlled to stop an optional optimization function in 30 
an optional phase. 

8. The compiling apparatus according to claim 2, 
wherein the optimizing means and intermediate repre- 
sentation changing means repeat the same predeter- 
mined optimization. 35 

9. The compiling apparatus according to claim 2, 
wherein a physical memory map structure of main data 
of the intermediate representations during the interme- 
diate representation changing and optimizing processes 

is fixed; and 40 
the fixed physical memory map structure are com- 
monly used by the intermediate representation 
changing and optimizing processes. 

10. The compiling apparatus according to claim 2, 
wherein the optimizing structure determination means 45 
selects a plurality of optimizing structures, separately 
carries out optimization according to each of the se- 
lected structures, and selects the best optimizing struc- 
ture among the optimizing structures according to the 
optimization results. 50 

11. The compiling apparatus according to claim 1, 
further comprising a switch that connects and discon- 
nects the optimizing structure determination means in 
front of the optimizing means and intermediate repre- 
sentation changing means, 55 

the switch being turned ON to determine an optimiz- 
ing structure in a compiling phase, change interme- 
diate representations according to the source pro- 
gram, and change optimization functions to be 
applied. 60 

12. The compiling apparatus according to claim 11, 
wherein, when the switch is turned OFF, the optimiz- 
ing means and intermediate representation changing 
means repeat predetermined intermediate representa- 
tion optimizing and changing processes a predeter- 65 
mined number of times. 

13. The compiling apparatus according to claim 12, 
wherein the optimizing means and intermediate repre- 



sentation changing means have a variable identifying a 
present phase when repeating the same optimization. 

14. The compiling apparatus according to claim 12, 
further comprising a selection switch to detennine 
whether or not an optional one of the optimization 
functions must be achieved in an optional one of the 
optimizing phases, the selection switch being externally 
controlled to stop an optional optimization function in 
an optional phase. 

15. The compiling apparatus according to claim 11, 
further comprising a selection switch to determine 
whether or not an optional one of the optimization 
functions must be achieved in an optional one of the 
optimizing phases, the selection switch being externally 
controlled to stop an optional optimization function in 
an optional phase. 

16. The compiling apparatus according to claim 1, 
wherein a physical memory map structure of main data 
of the intermediate representations during the interme- 
diate representation changing and optimizing processes 
is fixed; and thereby the fixed physical memory map 
structure are commonly used by the intermediate repre- 
sentation changing and optimizing processes. 

17. The compiling apparatus according to claim 1, 
wherein the optimizing structure determination means 
selects a plurality of optimizing structures, separately 
carries out optimization according to each of the se- 
lected structures, and selects the best optimizing struc- 
ture among the optimizing structures according to the 
optimization results. 

18. The compiling apparatus according to claim 1, 
wherein the optimizing means and intermediate repre- 
sentation changing means repeat the same predeter- 
mined optimization. 

19. The compiling apparatus according to claim 1, 
further comprising printing means for printing interme- 
diate representations in each phase and a selection 
switch for activating the printing means, 

the selection switch being externally controlled to 
print intermediate representations at optional tim- 
ing in any one of the phases. 

20. A compiling apparatus for compiling a source 
program in a data processing apparatus, comprising: 

a front end traiisforming a source program into inter- 
mediate representations; 

optimizing means for optimizing the intermediate 
representations from the front end in accordance 
with predetermined optimization functions; 

at least one intermediate representation changing 
means for further transferring the optimized inter- 
mediate representations from the optimizing means 
in accordance with predetermined optimization 
functions, the at least one intermediate representa- 
tion changing means including final intermediate 
representation changing means; 

code output means for providing code according to 
the intermediate representations obtained by the 
optimization from the file in intermediate represen- 
tation changing means; and 

optimizing structure determination means for deter- 
mining a number of repetitions required for optimi- 
zation by the optimizing means and intermediate 
representation changing means, and selecting the 
optimization functions for directing an optimizing 
process performed in each of the repetitions, and 
thereby the intermediate representations being 
changed and optimized. 
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21. A compiling method in a data processing appara- 
tus comprising steps of: 

a step for providing intermediate representations 
described by an intermediate language according 
to a source program; 5 

an optimizing structure determination step for deter- 
mining the number of repetitions of an optimization 
phase to execute optimization of the intermediate 
language and selecting optimization functions to 
direct an optimizing process carried out in each of 10 
the optimization phases; 

an optimizing step for optimizing the intermediate 
representations provided by the source program 
and the intermediate representations in each of the 
optimization phases, according to the determina- 15 
tion by the optimizing structure determination step; 

an intermediate representation changing step for fur- 
ther transforming the optimized intermediate rep- 
resentation in each of the optimization phases, ac- 
cording to the determination by the optimizing 20 
structure determination step; 



631 

20 

a determination step for determining the number of 

repetitions of an optimization phase; and 
a code output step for providing codes according to 

the finally obtained intermediate representations. 
22. A compiling method according to claim 21, 
wherein the optimizing structure determination step 
includes steps of: 
a pick up step for selecting at least one optimization 
function; 

a classification step for classifying the selected opti- 
mization functions into execution classes; and 
a compiler structure determination step for determin- 
ing the number of times of changing intermediate 
representations according to the types of interme- 
diate representations and intermediate dependen- 
cies, sorting and extracting applicable ones among 
the optimization functions in each optimization 
phase, and determining the execution order of the 
optimization functions according to dependencies 
of the functions. 

* » * « * 
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computer to implement an instruction set translator (1ST) 
for dynamically translating the machine language instruc- 
tions of an alien source computer into a set of functionally 
equivalent target computer machine language instructions, 
providing in the target machine, an execution environment 
for source machine operating systems, application sub- 
systems, and applications. The target system provides a 
unique pointer table in target virtual address space that 
connects each source program instruction in the multiple 
source virtual address spaces to a target instruction transla- 
tion which emulates the function of that source instruction in 
the target system. The target system efficiently stores the 
translated executable source programs by actually storing 
only one copy of any source program, regardless of the 
number of source address spaces in which the source pro- 
gram exists. The target system efficiently manages dynamic 
changes in the source machine storage, accommodating the 
nature of a preemptive, multitasking source operating sys- 
tem. The target system preserves the security and data 
integrity for the source programs on a par with their security 
and data integrity obtainable when executing in source 
processors (i.e. having the source architecture as their native 
architecture). The target computer execution maintains 
source-architected logical separations between programs 
and data executing in different source address spaces — 
without a need for the target system to be aware of the source 
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METHOD OF USING A TARGET 
PROCESSOR TO EXECUTE PROGRAMS OF 
A SOURCE ARCHITECTURE THAT USES 
MULTIPLE ADDRESS SPACES 

5 

BACKGROUND OF THE INVENTION 

The problem of moving programs from one computer 
architecture to another has given rise to a number of strat- 
egies for migration which minimize human effort. Among 10 
systems which support the same application interfaces, and 
exhibit the same security and data integrity characteristics, 
simple re-compilation of an application may suffice. How- 
ever, this does not work in cases where high-level language 
source code is unavailable or where there is no compiler 15 
written for the target platform. In these cases, a more general 
solution is necessary to solve the problem of application 
migration. 

Application migration is not the only problem. Applica- 
tion interfaces are built upon application enablement layers 20 
and operating systems, the attributes of which are highly 
dependent on processor architecture, and thus, may vary 
greatly from machine to machine. To move an application, 
it may also be necessary to port the application enablement 
environment, and possibly the underlying control program. 25 
Such an undertaking requires a great deal of human effort, 
and often fails to preserve the security and data integrity 
characteristics assumed by the users of the applications, 
because of operation on the original platform. A general 
solution to this second problem, which minimizes human 30 
effort and preserves desirable architectural attributes, would 
be quite beneficial in such cases. 

A general solution for the first problem, application 
migration, is provided by an emulation technique known as 35 
binary translation, which automatically converts each 
machine instruction in a program written for a source 
processor architecture into one or more target processor 
machine instructions, effectively translating the source 
machine program, instruction by instruction, into a target m 
machine program. 

Often, binary translation is incorporated into a run-time 
environment, in which a previously unencountered source 
instruction is dynamically translated into target instructions 
as needed, and the translation is saved and reused the next 45 
time that particular source instruction is to be executed. Of 
course, data structures must be employed to track which 
instructions of the source program have been translated, and 
where the translations are stored. Also, the original source 
machine program instructions must be maintained, in addi- 50 
tion to the target translations of those source instructions. 
This use of binary translation provides a relatively efficient 
means for emulating the computational and logical charac- 
teristics of a source architecture on a target processor. 

Binary translation is far superior to brute force emulation 55 
techniques, which interpret source instructions as they are 
encountered, extracting parameters such as register specifi- 
cations, and immediate data fields, and calling subroutines 
based on the opcode. Brute force approaches incur these 
penalties every time a source instruction is executed, while 60 
binary translation techniques incur such penalties only the 
first time a given instance of a source instruction is executed. 
Every subsequent execution of that particular instruction 
instance proceeds with the efficiency afforded by the trans- 
lated code. In general, these methods have the effect of 65 
producing target machine code that has little dependence on 
source architecture state information, and bears a strong 
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resemblance to code compiled natively for the target archi- 
tecture. 

While binary translation addresses the first problem, 
providing a general approach for application migration, it 
does not address the second problem, providing a general 
approach for migrating the application enablement environ- 
ment and control program. Binary translation exploits the 
fact that most processor architectures provide similar com- 
putational and logical operations, the aspects of the archi- 
tecture that are explicitly used by application programs. The 
layers below the application interface, which provide secu- 
rity and data integrity to the applications, and manage 
system resources which are shared by multiple independent 
applications, must use other facilities provided by a proces- 
sor architecture. These facilities may vary greatly from 
architecture to architecture, particularily between CISC 
(Complex Instruction Set Computer) and RISC (Reduced 
Instruction Set Computer) architectures, and often the facili- 
ties provided by one architecture have no analog in another 
architecture. The inefficiencies associated with emulating 
these facilities in software have historically been prohibi- 
tive, thereby restricting the use of binary translation to the 
application domain. 

For this reason, other means have generally been used to 
provide the characteristics of the interface to which emu- 
lated applications are written. The software layers underly- 
ing this interface are most often rewritten specifically for the 
target architecture at great expense. Depending on the dif- 
ferences in the underlying processor architectures, many of 
the security and data integrity characteristics of the envi- 
ronment may be sacrificed in the process. And, unique 
characteristics of the original operating system can be lost as 
a result. In other words, no general solution has yet 
addressed the second problem. 

The approach described in application Sen No. 08/349, 
771, entitled "Storage Access Authorization Controls In A 
Computer System Using Dynamic Translation of Large 
Addresses", assigned to the same assignee as the present 
invention and filed on the same days as the present inven- 
tion, incorporated here by reference, does provide a general 
approach to solving the second problem. By utilizing large 
virtual addressing in a target processor, it incorporates all the 
authorization mechanisms of a source processor architec- 
ture. The target processor dynamic address translation hard- 
ware is used to check the legality of storage accesses, and to 
map legal storage accesses such that they proceed with the 
efficiency of the target processor hardware after legality is 
established. By providing an efficient means to implement 
the authorization facilities of one architecture on another, it 
provides the impetus to extend binary translation techniques 
to cover the entire environment: the application layer, the 
application enablement environment, and the control pro- 
gram. 

It is the goal of this invention to have one computer, the 
target, provide the necessary program execution environ- 
ment of another computer, the source, by means of the 
binary translation of the instructions of the source computer 
to those of the target computer, such that the control pro- 
gram, application enablement environment, and application 
programs perform their functions on the target computer in 
the same manner as they do on the source computer. From 
the point of view of the computing establishment, the source 
machine characteristics are observed in the target machine 
operation, and include the security, integrity, and functional 
aspects, including those provided by the source operating 
system, all without the cost of implementing the desired 
elements of the source as native elements of the target. 
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Due to their limited scope, existing binary translation 
techniques are heavily optimized around the issues which 
exist in the application domain. They do not need to manage 
the additional complexities which are handled in the soft- 
ware layers beneath the application interface. The approach 5 
described in this invention re-focuses the implementation of 
these techniques, making them feasible for the emulation of 
the entire system environment. 

When the emulated portion of the system environment is 
extended beyond the application layer, to include the appli- 10 
cation enablement and control program layers, the ground 
rules change drastically, The purpose of a multitasking 
operating system is to maximize the value provided by 
computing resources, by efficiently distributing them 
between multiple applications. To do so it must manage the 15 
asynchronous events that are part of the total computing 
environment. As a consequence of such events, the operating 
system will interrupt the flow of one program and divert 
computing resources to another. Of course, it must keep 
track of the current instruction address for the interrupted 20 
program, so that the program can resume execution where it 
left off. 

As stated above, most existing binary translation 
approaches emulate the application domain only, and run on 
an application interface which runs natively on the target 25 
processor. From the perspective of the target operating 
system, the emulation environment built around each emu- 
lated source application is just another target application. 
Thus, when the multitasking target operating system pre- 
empts an emulated source machine program, the return 30 
address is actually a target machine address. 

However, when the emulated portion of the system envi- 
ronment includes the source operating system, and the 
interruptions themselves are emulated, the return addresses 35 
for preempted programs are source machine addresses. This 
is because the applications are running under the direct 
control of the source operating system, which dynamically 
allocates the system resources among them. Likewise, syn- 
chronous and asynchronous exceptions, such as page faults ^ 
and I/O interrupts, require that a source address be used to 
determine the point of re-entry into the interrupted program. 
This is because the source operating system handles these 
events within the emulated source operating environment. 
Essentially, even though the application is being executed by 45 
means of binary translation, it is executing in a source 
machine operating environment. 

When limited to the application domain, binary transla- 
tion requires code entry points determined by source 
addresses only for branches within an application which 50 
cannot be determined by static analysis. Existing techniques 
focus heavily on this, going to great lengths to minimize the 
number of source entry points. When extended to include the 
operating system, binary translation requires a code entry 
point following every possible point of program preemption. 55 
This results in a huge increase in the number of possible 
entry points, specified by a source machine address, in an 
emulated program. The invention described herein alleviates 
the severe performance repercussions that such an increase 
would cause in normal binary translation environments. ^ 
Also, existing techniques map source instructions in a single 
application address space to a set of translations for those 
instructions. When the entire system environment is emu- 
lated, there are multiple address spaces, each potentially 
containing instructions which must be emulated. 55 

The ramifications of dynamic address translation and the 
possibility of programs shared between multiple address 



spaces add complexity and potential for new optimizations, 
while vasdy increasing the scope of the translation manage- 
ment algorithms. For example, some operating system struc- 
tures map the same system code into all instruction address 
spaces in order to improve system performance of linkage 
to, and data access by, that code. If applied to a multi-space 
emulation environment, traditional techniques would pro- 
duce multiple translations for the same code, one for each 
address space that the code is part of. It is advantageous to 
minimize such duplication. The method described here 
discovers such multiple mappings of code through use of 
source DAT, and causes an existing translation of code to be 
used for all its uses in the source. 

Finally, the existing techniques are static in nature, since 
the code in an application environment is generally not 
modified during its execution. An entire system environ- 
ment, on the other hand, is very dynamic in nature. Given the 
objective of providing the full operational characteristics of 
the source operating system and those of its native applica- 
tion enablement programs in the emulated environment, full 
multitasking of multiple applications and application 
enablement programs must be supported. Application envi- 
ronments are instantiated and discarded. Programs are 
loaded, and overwritten. The source operating system con- 
trols the allocation and deallocation of the assigned source 
storage, allocated within the target machine storage. Real 
page frames are allocated in the source storage to back 
virtual pages, and eventually deallocated. Thus, the algo- 
rithms designed to maintain an emulated system environ- 
ment must support the management of this volatility by 
creating and destroying target translations as source instruc- 
tions are created and destroyed in the emulated environment, 

SUMMARY OF THE INVENTION 

A "source program" is herein defined as a program written 
for a processor architecture which is alien to the architecture 
of the target processor. A "source program" contains instruc- 
tions, operating states and requires executing facilities not 
found in the target processor architecture. In this specifica- 
tion, the term, source program, is therefore different from, 
and should not be confused with, a common use of the term 
"source program" as a program in a high-level language 
distinguished from the same program in executable object 
code form. Also, the words machine and processor are often 
used interchangeably throughout the description of this 
invention. 

As discussed in the background, this invention assumes 
the methods described in application Ser. No. 08/349,771 for 
implementing the source processor's architected state and 
the controls thereof on the target processor. The same 
requirements on source and target processor attributes are 
assumed as well, as this invention imposes no additional 
requirements beyond those described in application Ser. No. 
08/349,771. 

Application Sen No. 08/349,771 describes several struc- 
tures that are established in target machine virtual and target 
machine real storage that are used in an embodiment of that 
invention. These include, but are not limited to, target 
exploded virtual storage, to which all combinations of 
source effective addresses and storage access states map 
directly; source real storage, which is represented by a 
contiguous area of target real storage; the source processor 
registers, some of which are represented by target processor 
registers, and others of which are represented within target 
storage; and the source frame-to- PTE-list directory and 
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single-frame-PTE-list area, which are used to manage the 
mapping of multiple virtual pages to a single real page 
frame. These structures are assumed by this invention. 

This invention includes a number of additional structures, 
which maintain the target machine equivalents of source 5 
program instructions, in a manner similar to that of existing 
binary translation techniques, yet extended to assume the 
techniques of application Scr. No. 08/349,771, while 
addressing the issues associated with emulating an entire 
system operating environment. These structures include an 10 
instruction translation map (ITM), an instruction translation 
region (ITR), and additional structures which are required to 
effectively maintain the ITM and the ITR. The ITM asso- 
ciates source machine instructions, found in source storage, 
with the target machine translations of these instructions, 15 
which translations are found in the ITR. 

FIG. 1 shows a memory map of target machine virtual 
storage and a memory map of target machine real storage. 
The important structures used by an emulator which pertain 
to this invention are shown. The dotted arrows indicate the 20 
dynamic address translation (DAT) relationships between 
structures which exist in virtual storage and the structures in 
real storage which back them. 

This invention relies on a large virtual addressibility in the 
target machine, as described in the separate application Ser. 25 
No. 08/349,771, which is assumed by this invention. The 
structures which exist in target machine virtual storage are 
described here. 

The emulator control program (102) manages the target 3Q 
processor resources as specified in the flowcharts (FIGS. 9 
through 13) of this invention and the flowcharts (FIGS. 26 
and 27) of 08/349,771, which this invention assumes, by 
. handling target machine exception conditions and perform- 
ing various operations on the structures described in this 35 
invention and in 08/349,771 as specified herein, as well as 
managing the target machine page table, and any input/ 
output operations required by a specific embodiment. The 
emulator control program (102) is backed in real storage in 
part by the pinned control program area (107), and in part by ^ 
the demand paging storage pool (108). These methods for 
backing virtual storage with real storage are well known in 
the prior art. 

The instruction set translator (103) is a program which 
performs several functions related to translating one or more 45 
source machine instructions into a set of target machine 
instructions which perform the equivalent function, and 
maintaining the translations which it produces. The require- 
ments for the interface to the 1ST, the types of services 
provided by the 1ST, and certain attributes of the target 50 
machine instructions produced by the 1ST are specified in 
this invention and in 2nd in 08/349,771 (which is assumed 
by this invention). The portion of the 1ST which produces 
target machine translations of source machine instructions is 
hereafter called the translator function, and the portion of the 55 
1ST which provided storage management services is here- 
after called the translation management function. The 1ST 
(103) is backed in real storage by page frames from the 
demand paging storage pool (108). This methods for back- 
ing virtual storage with real storage is well known in the 60 
prior art 

Source machine exploded storage (104) is a very large 
area of target machine virtual storage which area encom- 
passes all of the exploded permutations of all source 
machine storage locations across all source machine virtual 65 
address spaces, the source machine DAT-off address space, 
and all source machine unresolved address space encodings. 



It is described in detail in 08/349,771 (which is assumed by 
this invention), where it is called target exploded virtual 
storage (since it resides in the target machine's virtual 
storage). The location of source machine exploded storage 
(104) in target machine virtual storage is indicated by BASE 
A, which specifies a displacement from target virtual storage 
address zero. BASE A is referred to later in this invention. 
Since source machine exploded storage (104) is the virtual 
representation of the contents of the source machine's 
memory, it is backed in target machine real storage by source 
machine real storage (109), which is a contiguous range of 
target machine real storage which contains the storage 
contents of the source machine being emulated. The algo- 
rithm which specifies the means and method by which 
source machine exploded storage pages are backed by 
source machine real storage page frames is a subject of 
application Ser. No. 08/349,771, which is assumed in this 
invention. In the figure, the arrow which indicates the DAT 
relationship between source machine exploded storage (104) 
and source machine real storage (109) is split at the end to 
indicate that multiple pages in source machine exploded 
storage (104) may be backed by the same page frame in the 
source machine real storage (109) portion of target machine 
real storage. 

This results in complications which are described in 
application Ser. No. 08/349,771, and later in this invention. 
Since source machine exploded storage (104) is a virtual 
representation of the entire contents of the source machine's 
memory, each source machine instruction existing in the 
source machine's memory will reside at one or more loca- 
tions in source machine exploded storage (104). 

The instruction translation map (105) occupies a very 
large range of target machine virtual storage. It is a large 
table of pointers to instruction translations. There is one such 
pointer in the ITM for every possible storage location at 
which an instruction can exist in a source machine address 
space across all possible source address spaces. 

For example, if a source machine supports a 31 -bit 
address, and requires that all instructions begin on a two 
-byte boundary, then there are 2**(31)/2=2**30=1 ,073,741 , 
824 possible locations at which a source instruction may 
reside in a source address space. If the source machine 
supports a 16-bit address space identifier, then there are 
2**(46>=65536 possible address spaces. Thus there are 2** 
(46)=70,368,744, 177,664 possible locations at which an 
instruction may reside across all source address spaces. It 
follows that the virtual representation of the ITM in this case 
would be a table in virtual storage only, containing 70,368, 
744,177,664 pointers to possible instruction translations. 
These translations exist in the instruction translation region 
(106) which is backed in real storage by the real ITR (114). 
Due to its immense size and sparcity, the virtual ITM (105) 
is backed by a significantly smaller pool of real storage 
called the real ITM cache (111). The algorithm which 
specifies the means and method by which virtual ITM pages 
are backed by real ITM cache page frames is described later 
in this invention. In the figure, the arrow which indicates the 
DAT relationship between the virtual ITM (105) and real 
ITM cache (111) is split at the end to indicate that multiple 
pages in the virtual ITM (105) may be backed by the same 
page frame in the real ITM cache (111) portion of target 
machine real storage. This results in complications similar to 
those described in 08/349,771 for handling source exploded 
addresses, and later in this invention. The location of the 
virtual ITM (105) in target machine virtual storage is indi- 
cated by BASE B, which specifies a displacement from 
target virtual storage address zero. BASE B is referred to 
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later in this invention. The ITM is divided into uniformly 
sized sections which are related to corresponding sections in 
the instruction translation region (106) and source machine 
real storage (109). The relationships between these are 
described later. 

The instruction translation region (106) also occupies a 
very large range of target machine virtual storage. It is 
divided into uniformly sized sections, such that each instruc- 
tion translation region (ITR) section maps directly to a 
corresponding section of source machine real storage (109) 
and to a corresponding section of the ITM (105). The details 
of these relationships are described later. The amount of 
target machine virtual storage allocated to a virtual ITR 
section must be sufficiently large to contain all of the target 
machine translated code associated with all of the source 
machine instructions in a section of source machine real 
storage (109) in a worst case scenario. There is one such ITR 
section permanently assigned to each section of source 
machine real storage (109). For example, a particular 
embodiment may choose to allocate 1,048,576 bytes for 
each virtual ITR section corresponding to a 1,024 byte 
source machine real storage section. If the source machine 
has a total of 2,147,483,648 bytes of real storage, the size of 
the virtual ITR would be 2,199,023,255,552 bytes. The 
instruction translations maintained in the virtual ITR (106) 
are located by the pointers contained in the ITM (105). Due 
to its immense size and sparcity, the virtual ITR (106) is 
backed by a significantly smaller pool of real storage called 
the real ITR (112). Real ITR page frames are allocated to 
virtual ITR pages in a demand paging fashion well known in 
the prior art. The location of the virtual ITR (105) in target 
machine virtual storage is indicated by BASE F, which 
specifies a displacement from target virtual storage address 
zero. BASE F is referred to later in this invention. 

The structures which exist in target machine real storage 
are described here. 

The pinned control program area (107) is an area of target 
machine real storage which is permanently assigned 
("pinned") to portions of the emulator control program (102) 
and its data structures. Such techniques are well known in 
the prior art. 

The demand paging storage pool (108) is an area of target 
machine real storage from which page frames are allocated 
as needed to back currently active portions of the emulator 45 
control program (102), the instruction set translator (103), 
and their associated data structures. Such techniques are well 
known in the prior art. 

Source machine real storage (109) is a contiguous range 
of target machine real storage which contains the real 
storage contents of the source machine being emulated. Page 
frames in source machine real storage (109) back the virtual 
pages in source machine exploded storage (104). The algo- 
rithm which specifies the means and method by which 
source machine exploded storage pages are backed by 
source machine real storage page frames is a subject of 
separate application Ser. No. 08/349,771, which is assumed 
in this invention. In the figure, the arrow which indicates the 
DAT relationship between source machine exploded storage 
(104) and source machine real storage (109) is split at the 
end to indicate that multiple pages in source machine 
exploded storage (104) may be backed by the same page 
frame in the source machine real storage (109) portion of 
target machine real storage. This results in complications 
which are described in application Ser. No. 08/349,771, and 65 
later in this invention. Since source machine real storage 
(109) is a real storage representation of the entire contents of 
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the source machine's memory, each source machine instruc- 
tion existing in the source machine's memory will reside at 
one and only one location in source machine real storage 
(109). The location of source machine real storage (109) in 
target machine real storage is indicated by BASE D, which 
specifies a displacement from target real storage address 
zero. BASE D is referred to later in this invention. As 
described earlier, source machine real storage is divided into 
uniformly sized sections. These are related to corresponding 
sections in the ITM (105) and in the ITR (106). The 
relationships between these are described later. The real ITM 
directory (110) is a table in target real storage which contains 
one pointer for each section of source machine real storage 
(109). If the pointer contains a NULL value, its correspond- 
ing source machine real storage section does not have an 
associated ITM section in the real ITM cache (111). If the 
pointer is valid, it points to the ITM section in the real ITM 
cache that is associated with its corresponding source 
machine real storage section. The real ITM directory exists 
in target real storage only, i.e., it is only accessed by the 
emulator when the target machine is in DAT-off mode. The 
location of the real ITM directory (110) in target machine 
real storage is indicated by BASE E, which specifies a 
displacement from target real storage address zero. BASE E 
is referred to later in this invention. 

The real ITM cache (111) is a pool of real storage which 
backs the most recently used pages in the virtual ITM (105). 
The algorithm which specifies the means and method by 
which virtual ITM pages are backed by real ITM cache page 
frames is described later in this invention. In the figure, the 
arrow which indicates the DAT relationship between the 
virtual ITM (105) and real ITM cache (111) is split at the end 
to indicate that multiple pages in the virtual ITM (105) may 
be backed by the same page frame in the real ITM cache 
(111) portion of target machine real storage. This results in 
complications which are described in application Ser. No. 
08/349,771, and later in this invention. The location of the 
real ITM cache (111) in target machine real storage is 
indicated by BASE C, which specifies a displacement from 
target real storage address zero. BASE C is referred to later 
in this invention. Since its backs the virtual ITM (105), the 
real ITM cache is also divided into uniformly sized sections. 
A section in the real ITM cache (111) that is associated with 
a given section of source machine real storage (109) may be 
located by consulting the real ITM directory (110). This is 
described in detail later. 

The target machine page table (112) exists in target real 
storage. It contains the page table entries which are used by 
the target machine DAT mechanism to translate target 
machine virtual addresses to target machine real addresses. 
The target machine DAT mechanism is not a subject of this 
invention, and the invention may embody any target DAT 
mechanism that does not preclude the association of mul- 
tiple virtual pages with a single real page frame. 

The frame- to-PTE-li st directory (FPD) and single-frame- 
PTE-list (SFP) are structures which are used to manage the 
mapping of multiple virtual pages to a single real page 
frame. This is discussed in application Ser. No. 08/349,771, 
and later in this invention. These structures exist in target 
real storage (113) only, i.e., they are only accessed by the 
emulator when the target machine is in DAT-off mode. 

The real ITR (114) is a pool of real storage which backs 
the most recently used pages in the virtual ITR (106). 
Techniques for backing recently used pages are well known 
in the prior art. 

FIG. 2 shows the relationships among the major structures 
used by the invention. 
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Source exploded virtual storage (201) is described in FIG. 
1 (104), and in detail in application Ser. No. 08/349,771, 
which is assumed by this invention. An address in source 
exploded virtual storage is the input to the process which 
finds the target machine instruction translation for the source 5 
machine instruction which resides at that address. Specifi- 
cally, the source processor effective instruction address, 
combined with the address space identifier for the address 
space from which an instruction is to be fetched indicates a 
unique source instruction reference from the program's 
perspective. 

The virtual ITM (202) is a table which contains an entry 
for each possible unique addressable instruction contained in 
source exploded virtual storage (201). It is assigned a 
contiguous area of the large target machine virtual address 
space. Because of this direct, one-to-one mapping, a rela- 
tionship (208) exists between source exploded virtual stor- 
age (201) and the virtual ITM (202), such that given an 
address into one, the corresponding address in the other may 
be determined directly. The virtual ITM (202) exists in target 2Q 
virtual storage only, thus, its size requirements, while sig- 
nificant, are accommodated by target machines which meet 
the large virtual storage requirements enumerated in appli- 
cation Ser. No. 08/349,771. The entries in the virtual ITM 
(202) are described in the following section which describes 
the real ITM cache (203). 

The real ITM cache (203) is the pool of target real storage 
which backs the most recently used entries in the virtual 
ITM (202). Thus, when a target machine page table entry 
(PTE) exists which maps a page in the virtual ITM (202) to 30 
a page frame in the real ITM cache (203), a relationship 
(209) exists which automatically maps references to the 
virtual ITM, by target machine dynamic address translation 
(DAT), to the real ITM cache. When no such PTE exists, the 
relationship (209) is non-existent. The management of such 35 
a situation is described later. Each ITM entry contains two 
elements. The first element is either a pointer to the set of 
target instructions which correspond to the source instruc- 
tion at a given source storage location, or a pointer to the 
entry point for a function, called the translator, which 40 
produces a set of target instructions given a source instruc- 
tion. The second element contains information related to the 
span of the source instruction image whose translation is 
mapped by the entry. The term ITM section is defined to be 
the unit of granularity by which the real ITM cache (203) 45 
backs the virtual ITM (202). For obvious reasons an ITM 
section must be no smaller than a page of target storage, 
however, it may be larger, provided that it is an exact 
multiple of the target page size. The term source storage 
section is defined to be a unit of source storage that directly 50 
corresponds to an ITM section. For example, if source 
instructions are addressable on two-byte boundaries, and 
ITM entries are eight bytes in size, and the size of an ITM 
section is set at 4096 bytes, the size of a source storage 
section would be 1024 bytes. The term ITR section is 55 
defined to be a unit of ITR virtual storage that directly 
corresponds to an ITM section and a source storage section. 
The ITR section is described fully below. 

The virtual ITR (204) is a large area of target virtual 
storage which contains the sets of target instructions or 60 
translator function entry points referenced by the first ele- 
ment of each entry in the real ITM cache (203). The virtual 
ITR (204) is divided into ITR sections that directly corre- 
spond to ITM sections and source storage sections. A direct, 
one-to-one relationship (215) exists between the virtual ITR 65 
(204) and source real storage (206), such that each source 
storage section in source real storage (206), has a cone- 
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spending ITR section in the virtual ITR (204). Therefore, 
ITR sections may be arbitrarily large, as long as the virtual 
ITR (204) contains one ITR section for each source storage 
section in source real storage (206). An ITR section should 
be large enough to hold the translations for all possible 
source instructions in the source storage section associated 
with it. The internal organization and storage management 
of the virtual ITR is described later. When an entry in the 
real ITM cache (203) holds a pointer to a set of target 
instructions which correspond to a source instruction, which 
target instructions will be found in the ITR section associ- 
ated with the ITM section, a relationship (210) exists 
through which a pointer in the real ITM cache is used to 
access an instruction translation. When an entry holds a 
pointer to the translator function for the ITR section, which 
translator function entry point will be found in the ITR 
section associated with the ITM section, the relationship 
(210) is non-existent The management of such a situation is 
described later. 

The real ITR (205) is the pool of target real storage which 
backs the most recently used pages in the virtual ITR (204). 
Thus, when a target machine page table entry (PTE) exists 
which maps a page in the virtual ITR (204) to a page frame 
in the real ITR (205), a relationship (211) exists which 
automatically maps references to the virtual ITR, by target 
machine dynamic address translation (DAT), to the real ITR. 
When no such PTE exists, the relationship (209) is non- 
existent The management of such a situation is described 
later. 

Source real storage (206) is the contiguous block of target 
real storage which direcdy represents source machine real 
storage. It is the ultimate repository of source programs, 
source data, and source control structures such as page 
tables. Since it either backs all source virtual storage pages 
or contains the structures used by the algorithms which back 
all source virtual storage pages on demand, there is a 
relationship (212), by which references to source exploded 
virtual storage (201) are mapped to source real storage (206) 
through the use of the source DAT mechanisms. As 
described above, there is a direct, one-to-one relationship 
(215) between each source storage section in source real 
storage, and its corresponding ITR section in the virtual ITR 

(204) , That is, the translations for the instructions found in 
a source storage section in source real storage (206) are 
maintained in the corresponding ITR section found in the 
virtual ITR (204), backed by pages from the real ITR (205). 

The real ITM directory (207) is a tabic, which exists in 
target real storage, that contains an entry for each source 
storage section in source real storage (206). Because of this 
direct, one-to-one mapping, a relationship (213) exists 
between source real storage (206) and the real ITM directory 
(207). Each entry in the real ITM directory (207) contains 
either a pointer to the ITM section in the real ITM cache 
(203) which contains the set of entries which map the source 
instruction images contained in the source storage section 
found in source real storage (206) which corresponds to the 
entry in the real ITM directory (207), or a NULL pointer if 
there is no ITM section in the real ITM cache (203) 
corresponding to the source storage section in source real 
storage (206). In the former case, a relationship (214) exists 
through which a pointer in the real ITM directory (207) is 
used to access an ITM section in the real ITM cache (203). 
In the NULL case, the relationship (214) is non-existent The 
management of such a situation is described later. 

For simplicity the real ITM cache (203) and the real ITR 

(205) have been described as independent pools of real 
storage, dedicated to backing their respective virtual struc- 
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tures. Depending on the embodiment, however, it is likely 
that the real ITR (205), and possibly the real ITM cache 
(203) may be drawn from a common, demand-paging pool 
of real storage in the target machine. However, source real 
storage (206) and the real ITM directory (207) must be 
independent contiguous areas of target real storage. 

The goal of efficiently locating a set of target machine 
instructions associated with a source machine effective 
instruction address is attained as described below. It is 
assumed in this process that any source page fault caused by 
an access to the source instruction being executed has been 
resolved, i.e., that relationship (212) exists. This may be 
tested by employing the techniques described in application 
Ser. No. 08/349,771 to perform an emulated source instruc- 
tion fetch. As stated above, the source effective instruction 
address combined with the source address space identifier 
for the source address space in which the source instruction 
exists (both of which may be extracted from the source 
exploded virtual address) yield an index into the virtual ITM 

(202) . When the DAT relationship (209) exists (it does in 
steady-state operation), an address into the real ITM cache 

(203) is produced by the target machine DAT hardware. The 
case in which the DAT relationship (209) does not exist is 
called an ITM fault, and is described later. In the case where 
the DAT relationship (209) does exist, the referenced real 
ITM cache entry contains a pointer. When this pointer 
references the set of target instructions that make up the 
translation of the source instruction (the steady-state case), 
i.e., the relationship (210) exists, the address into the virtual 
ITR (204) is known. The case in which the relationship (210) 
does not exist is called a translation miss, and is described 
later. If the relationship (210) does exist, the address into the 
virtual ITR is converted, by target machine DAT, into an 
address into the real ITR. If the DAT relationship (211) 
exists, the process is complete, and the set of target instruc- 
tions may be executed. If the relationship (211) does not 
exist, an ITR fault occurs, and the target machine must 
allocate a frame from the real ITR to back the virtual page. 

In the event of a translation miss, i.e., relationship (210) 
does not exist, the pointer in the ITM points to the translator 
function entry point in the ITR section instead of a transla- 
tion. The translator function (or 1ST) has the responsibility 
of producing a set of target instructions to perform the 
function of the source instruction, managing the placement 
of these instructions into the virtual ITR section associated 
with the source real storage section from which the source 
instruction image was obtained, replacing the reference to 
itself in the ITM, with a pointer to the newly created target 
instructions, and indicating the span of the translation in the 
ITM. This action establishes relationship (210), and com- 
pletes the process of finding the desired set of target instruc- 
tions. 

In the event of an ITM fault, i.e., relationship (209) does 
not exist, the target machine observes a page fault, and the 
appropriate target machine exception handler is invoked. 
The exception handler determines, from the fact that the 
faulting virtual address referenced the area of target virtual 
storage allocated to the virtual ITM, that an ITM fault has 
occurred. The faulting address into the virtual ITM (202) is 
converted, by direct relationship (208), to an address into 
source exploded storage (201), which is converted, by 
source DAT relationship (212), to an address into source real 
storage (206). Note that when an ITM fault occurs, this DAT 
relationship (212) is guaranteed to exist, as stated above. The 
address into source real storage (206) is converted, by direct 
relationship (213), to an address into the real ITM directory 
(207). If the real ITM directory entry referenced by this 
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address contains a NULL pointer, the pointer relationship 
(214) does not exist. This situation is called an ITM miss, 
and is described later. If the relationship (214) does exist, the 
pointer in the real ITM directory entry points to the ITM 
section in the real ITM cache (203) which is to back the ITM 
section in the virtual ITM (202) which caused the ITM fault. 
One or more target PTE's (depending on the size of an ITM 
section) are allocated to establish the DAT relationship (209) 
between the faulting ITM section in the virtual ITM (202) 
and the ITM section in the real ITM cache (203). As a means 
to detect modification of the instruction stream, a control 
mechanism must be implemented using either the methods 
described in application Ser. No. 08/349,771 or a native 
store protection mechanism inherent in the target processor 
architecture, to protect the emulated source storage pages 
related to the virtual ITM section from stores. This will 
cause any stores to source storage that is associated with an 
ITM section that exists in the real ITM cache (203), to result 
in a target exception, so that appropriate action may be taken 
to update the ITM and ITR when source instruction images 
are modified. Such a control mechanism is described later. 
Once relationship (209) is established and the source storage 
protections are in place, the original interrupted process can 
resume. 

In the event of an ITM miss, i.e., relationship (214) does 
not exist, there is no ITM section in the real ITM cache (203) 
to back the ITM section in the virtual ITM (202), so one 
must be allocated. Normally such an allocation requires that 
an old ITM section in the real ITM cache (203) be destroyed 
to make room for the new ITM section. The algorithm used 
to determine which ITM section to destroy is not a subject 
of this invention, although it is recommended that this 
algorithm employ some kind of least-recently-used (LRU) 
scheme, such as testing the reference bits associated with 
pages of storage. When destroying the ITM section in the 
real ITM cache (203), the real ITM directory (207) entry 
associated with the old ITM section must be set to NULL. 
This is done by looking up any of the entries in the ITM 
section to determine the ITR section, and from that, to 
determine the source real storage section, and from its 
address, to determine the real ITM directory entry. Also, all 
virtual mappings to the old ITM section must be destroyed 
as well. Since it is possible that multiple mappings may 
exist, it is necessary to keep track of all virtual ITM (202) 
pages associated with a given real page frame in the real 
ITM cache (203). This may be accomplished using the 
techniques for managing multiplc-virtual-to-single-rcal 
page mappings described in application Ser. No. 08/349, 
771. A particular embodiment may determine that, at this 
time, it is advantageous to remove the store protection from 
the source storage associated with the ITM section. This, 
however, is not required, as a particular embodiment may 
decide to wait until an exception occurs and, at that time, 
upon discovering that the store protection exists for no 
apparent reason, the embodiment may remove it. It is also 
possible, though not required, to free real page frames that 
back virtual pages in the virtual ITR (204) that are refer- 
enced by the ITM entries in the ITM section that is being 
destroyed, since the information they contain is no longer 
relevant Once the old ITM section is destroyed, a new one 
may be created. This is accomplished by using the address 
into source real storage (206) which was determined during 
the resolution of the initial ITM fault, and converting it by 
direct relationship (215), to the address into the virtual ITR 
(204). This address is rounded down to the beginning of an 
ITR section. An ITR header structure is placed at this 
location. An ITR header structure is used to provide a unique 
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entry point to the translator function, which is associated 
with a particular ITR section, and to provide the state 
information used by the translator function to manage the 
translations maintained in that ITR section. The organization 
of the ITR header structure is determined by the translation 5 
management function. Note that storing the ITR header 
structure may cause a target page fault, which must be 
managed in normal fashion by demand paging from the real 
ITR (205). All entries in the new 1TM section are initialized 
to point to the translator function entry in the ITR header 
structure. The real storage address of the new ITM section 
(in the real ITM cache) is placed in the real ITM directory 
(207) entry which was determined during the resolution of 
the initial ITM fault, thus establishing the pointer relation- 
ship (214). It is necessary to apply the storage protection 
mechanisms referenced earlier to emulated source storage 15 
page(s) associated with the new ITM section. Once rela- 
tionship (214) is established and the source storage protec- 
tions are in place, the ITM fault process can resume. 

The goal of niaintaining a single set of translations for 
programs that are mapped into multiple source effective 20 
address spaces is attained by taking advantage of the fact 
that such duplication exists only in source exploded virtual 
storage (201), and not in source real storage (206), which is 
the ultimate repository for source machine instructions. By 
associating the target translations of source instructions, 25 
which translations are found in the virtual ITR (204), with 
the mapping of those source instructions in source real 
storage (206), unnecessary duplication can be eliminated. 
Thus, relationship (215) establishes this association of 
instruction images in source real storage (206) to their target 30 
machine equivalents in the virtual ITR (204). At odds with 
this, is the requirement that a translation must be located 
given the address of its corresponding source instruction in 
source exploded virtual storage (201). There may be many 
such source exploded virtual locations which are backed by 35 
the same source real location, and are thus associated with 
the same translation. Therefore, the. ITM, which associates 
source instruction images with their translations in the ITR, 
must index the translations by components of source 
exploded virtual addresses, while the translations them- 40 
selves are associated with source real addresses. Another 
important consideration is minimizing the target real storage 
requirements of the ITM, by eliminating duplication there as 
well. 

An optimal solution is attained by utilizing the free 45 
association provided by target DAT to break the ITM into 
two representations: a virtual representation, the virtual ITM 
(202), which has a direct relationship (208) to source 
exploded virtual storage (201); and a real representation, the 
real ITM cache (203), in which each real ITM cache (203) 50 
section has a one-to-one relationship (210) with a virtual 
ITR (204) section, which, in turn, has a one-to-one relation- 
ship (215) with a source real storage (206) section. Thus, for 
each section of source real storage, there is at most one set 
of instruction translations in the ITR, and at most one ITM 55 
section in the real ITM cache (203). In other words, regard- 
less of the number of source address spaces that contain a 
particular set of instructions, only one set of translations 
exists in the target real storage, and all virtual ITM sections 
for the particular set of instructions are backed by the same 60 
real ITM cache section. This minimizes target real storage 
consumption. Target DAT provides the relationship (209) 
which ties together the virtual ITM (202) and the real ITM 
cache (203). The techniques described in application Ser. 
No. 08/349,771 are employed to manage the multiple- 65 
virtual-to-single-real mappings and the real ITM directory 
(207) is used as described earlier to create the mappings. 
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The goal of managing the volatility of an emulated source 
machine operating environment is attained by incorporating 
the ability to detect changes in that environment, and the 
means to manage system resources so as to react to those 
changes in an efficient manner. This is closely related to the 
goal of an operating system, and, in fact, the emulated 
source machine operating system will address these issues 
insofar as they pertain to those portions of the target machine 
that represent source machine facilities. The ITM and ITR 
are not source machine facilities, and the source operating 
system has no knowledge of them. They are, however, 
affected by the actions of the source machine, and therefore 
it is necessary to detect such source machine actions and 
reflect their consequences in the ITM and ITR, At the same 
time, the target machine must manage the resources used by 
the ITM and ITR as efficiently as possible. 

The translations maintained in the ITR are dependent on 
the instruction images in source real storage (206). There- 
fore, if a translation exists in the ITR for a given source 
instruction image, and that source instruction image is 
modified by a source machine operation, such modification 
must be detected, and the ITM and ITR must be updated 
accordingly, to reflect the effects of the modification. 

To this end, a store protection mechanism, provided either 
by the techniques described in application Ser. No. 08/349, 
771, or by the target machine hardware, is employed, which 
causes a target machine exception whenever a store to a 
protected page occurs. As in application Ser. No. 08/349, 
771, storage references can be qualified in target exploded 
virtual addresses by whether they are fetches or stores. 
Fetches can be allowed to operate at full processor speed. 
Stores to source pages containing translated instructions can 
be trapped by a page fault on the exploded address, and 
checked to determine whether or not they will modify a 
translated instruction. On the other hand, a target processor 
may contain a facility to protect a page from a store while 
allowing fetches to proceed without interference. Such a 
mechanism may protect a real storage page frame, as does 
the storage key of IBM S/390, or it may protect a virtual 
page by providing the specification of store protection in a 
page table entry, as in the PowerPC architecture. This 
invention requires that stores into already translated source 
instructions be detected, but is not dependent on the specific 
mechanism used to do so. The methods described provide 
correct functional operation of the total source operating 
environment regardless of such changes in instructions. This 
is achieved by the dynamic binary translation of the instruc- 
tions not yet translated, and the detection of stores into 
already translated instructions and the invalidation of 
affected translations. The protection is applied to all pages 
which represent source machine pages for which an ITM 
section resides in the real ITM cache (203). This is accom- 
plished by identifying all such pages at the time when the 
ITM section is created (refer to ITM misses), and, if nec- 
essary, by protecting virtual pages or their backing page 
frames as they are added to the set of such pages (refer to 
ITM faults). 

When such an exception occurs, the target machine 
exception handler must first determine if the exception was 
caused by storing into a page containing instructions for 
which an ITM section and ITR region currently exist. This 
can be accomplished quickly, by determining the real ITM 
directory (207) entry associated with the source real storage 
backing the store which caused the exception. If the entry 
contains a NULL pointer, there is no ITM section associated 
with the storage. This points to one of two possibilities: 
either a source authorization mechanism caused the excep- 
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tion, or there was previously an ITM section associated with 
the source page, and when that ITM section was destroyed, 
the store protection was not removed. If the real ITM 
directory (207) entry contains a valid pointer, there is an 
ITM section associated with the storage. In this case there 5 
are also two possibilities: there may or may not be a source 
storage authorization violation in addition to the ITM related 
exception. Thus, regardless of whether the ITM section 
exists, the methods of application Ser. No. 08/349,771 must 
be used to determine whether or not a source storage 
authorization violation has occurred. If so, the appropriate 
source exception must be emulated. in the source operating 
environment. If not, and no ITM section exists, the store 
protection can be removed, and the violating store re- 
executed. If not, and an ITM section does exist, the indi- 
vidual ITM entries affected by the store must be checked. 15 
This case, in which pointer and span information in these 
entries indicates that a translation has been affected by an 
overwrite of all or part of a source instruction, is called a 
translation corruption, and is described later. In the case in 
which the pointers in affected entries point to the translator 20 
function, no translations are affected, and the target machine 
can simply emulate the effects of the store, and return 
control to the instruction following the store. 

In the event of a translation corruption, the ITM and ITR 
must be updated to reflect the change in source machine 25 
storage. The span field of the ITM can be used to calculate 
the scope of a store operation into a program. If any part of 
an already translated source instruction is altered during 
source machine execution, either its ITM entry must be reset 
to return to the translator if the translation is subsequently 30 
executed, or the real ITM section can be deallocated. Spe- 
cific actions in consequence of such an invalidation of an 
existing translation are the responsibility of the translation 
management function. Regardless of the action of the trans- 
lation manager, control must finally be returned to the 
original exception handler, to emulate the modification of 35 
source machine storage. 

As a result of this kind of storage volatility, it may not be 
wise to expend a great deal of effort optimizing the trans- 
lated code. Due to the shorter life span of translations in a ^ 
fully functional source operating system, the fixed overhead 
associated with optimizing a translation contributes to a 
higher per-execution overhead. One of the effects of less 
optimization is an increase in the number of source machine 
code entry points. Without the efficient lookup mechanisms 45 
described earlier, including beneficial use of the large virtual 
address space of the target processor, such an increase would 
produce unfavorable performance implications. Thus, the 
benefit realized by these mechanisms is increased by the 
effects of the storage volatility in the full source operating 5Q 
environment emulation. 

It is an object of this invention to have one computer, 
called the target, provide the complete program execution 
environment of another computer, called the source, by 
means of binary translation of the instructions of the source 55 
computer to those of the target computer such that the 
control program, the application enablement programs, and 
the application programs of the source execute on the target 
as they do on the source without change to any of the 
programs. 60 

It is another object of this invention to provide the 
program execution environment of a source computer on a 
different target computer, using binary translation of source 
instructions to target instructions, and utilizing the methods 
of application Ser. No. 08/349,771 to provide the storage 65 
access authority states of the source in the target operation 
of source programs. 



It is another object of this invention to use the large virtual 
address space of the target machine to provide a direct 
addressing relationship, requiring a minimum number of 
steps for access, between source virtual addresses and an 
instruction translation map indicating target machine 
instruction translations of the source instructions at those 
addresses. 

It is another object of this invention to use the large virtual 
address space of the target machine to provide a direct 
addressing relationship, requiring a minimum number of 
steps for access, between each section of source real storage 
within the target and the section of target virtual storage 
which contains the target instruction translations instruc- 
tions in the section of source real storage. 

It is another object of this invention to use the large virtual 
address space of the target machine to provide a direct- 
address map of source machine instructions to target 
machine instruction translations, in addition to its use for 
providing the storage addressing required by the source 
rirogramming execution environment, which includes those 
source instructions. 

It is another object* of this invention to back in target real 
storage the direct virtually-addressed map that associates 
source instructions with the corresponding target instruction 
translations in such a way that those portions of the map 
which associate source instructions that are common to 
multiple source address spaces with corresponding target 
translations are backed by the same target real storage. 

It is another object of this invention to maintain a single 
copy in target real storage of the target machine instruction 
translations of source instructions that are common to mul- 
tiple source virtual address spaces. 

It is another object of this invention to use target machine 
real storage to back only the sections of the direct map of 
source machine instructions to target machine translations 
for those instruction translations currently being maintained, 
indicating these sections by use of a directory which asso- 
ciates source real storage sections with the target real storage 
map sections for target translations of source instructions in 
those source real storage sections. 

It is another object of this invention to use target real 
storage to back only the sections of the target translations of 
source instructions for which a map section exists in target 
real storage, which associates source instructions with valid 
target translations. 

It is another object of this invention to use the method of 
application Ser. No. 08/349,771 to relate all map sections in 
target virtual storage backed by a single map section in 
target real storage, because they are for the same source 
machine instructions in multiple source address spaces, such 
that given any specific map section in target real storage, all 
map sections in target virtual storage that it backs can be 
determined. 

It is another object of this invention to impose store 
protection on target storage containing source machine 
instructions whose target translations are currently being 
maintained, by using either the methods described in appli- 
cation Ser. No. 08/349,771 or the native facilities of the 
target machine, such that modifications made by a source 
machine program to such instructions are detected. 

It is another object of this invention to record the source 
machine storage span represented by a target machine trans- 
lation of a source instruction, such that, on source machine 
storage operations modifying any part of already translated 
source instructions, it can be determined which translation 
or translations must be discarded, if any. 
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It is another object of this invention to use the large virtual 
address space of a target machine to provide a unique area 
of target machine virtual storage for each unique section of 
source machine real storage, such that this unique area of 
target machine virtual storage is of sufficient size to hold the 5 
target machine translations for every source machine 
instruction found in the source machine real storage section, 
and that this unique area of target machine virtual storage 
may be backed in target machine real storage by demand 
paging to manage the amount of real storage required to io 
store the translations. 



BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 shows the virtual storage and real storage maps of 15 
the the target machine processor. 

FIG. 2 shows the addressing relationships among the 
major structures used in this invention. 

FIG. 3 illustrates the addressing relationship between a 2 q 
source instruction address in source exploded virtual form 
and the virtual address of the ITM entry that indicates its 
translation state. 

FIG. 4 illustrates the relationship between an ITM virtual 
section and a backing real ITM cache section. 25 

FIG. 5 illustrates how the address of an existing backing 
real ITM section can be found when a page fault occurs on 
a virtual ITM reference. 

FIG. 6 shows the relationship between source real storage 
sections and target virtual ITR sections. 30 

FIG. 7 shows the relationship real ITM entries and the 
ITR sections that contain instruction translations associated 
with the ITM entries. 

FIG. 8 shows the structure that associates target PTEs 35 
resolving different virtual ITM sections to the same real ITM 
section. 

FIG. 9 shows the target processor overall logic for execut- 
ing a source instruction. 

FIG. 10 illustrates the target processor logic for handing 40 
an instruction translation miss occurs. 

FIG. 11 shows the processing performed when a page 
fault occurs on a reference to an ITM entry. 

FIG. 12 shows the processing when a real ITM section 
must be assigned to back a virtual ITM section. 

FIG. 13 shows the process for a source program store 
operation into a source real storage section from which some 
instructions have already been translated to target instruc- 
tions. 
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EMBODIMENT OF THE INVENTION 

The preferred embodiment shows the techniques of this 
invention as applied to the emulation of the IBM S/390 55 
(source) processor architecture on the 64-bit version of the 
IBM PowerPC (target) processor architecture. A 31 -bit 
S/390 virtual or real address is assumed. A 24-bit ASID is 
assumed. A 4-bit S/390 PSW key is assumed. A maximum 
of 2 additional bits are allotted for other S/390 access control 60 
fields (see 08/349,771, which is assumed by this invention). 
A 64-bit PowerPC virtual address is assumed. A 32-bit 
PowerPC real address is assumed. The page size of both the 
S/390 and the PowerPC architectures is 4096 bytes. None of 
these metrics are essential to the proper operation of the 65 
invention. They represent specific attributes of this particu- 
lar ernbodiment. 



FIG. 3 illustrates the addressing relationship between a 
source instruction address in source exploded virtual form 
(301) and the virtual address of the ITM entry (303) that 
indicates its translation state. The low order zero of the 
exploded address indicates that source machine instructions 
must start on a two-byte boundary. The ASID and instruction 
address portion of the exploded address are shifted two bits 
left because each ITM entry is eight bytes wide (four times 
larger than the two byte logical instruction location). The 
base B is the starting point of the ITM area (305) in the target 
virtual address map (see FIG. 1). Concatenating B with the 
shifted ASID and instruction address gives the virtual 
address of the ITM entry for the instruction at the source 
exploded virtual address. As explained earlier, the ITM entry 
contains a target virtual address (307) to be branched to 
during source program execution. The address will be that of 
the target instructions performing the function of the source 
instruction (306) if a translation already exists. Box 304 
represents the total virtual storage containing all source 
instructions in all address spaces. If a translation does not 
exist, the address will be that of the translator, which, when 
entered in this way, will resume instruction translation at the 
source exploded virtual address of the untranslated instruc- 
tion of the ITM entry. In this embodiment, target branches 
must be to locations on 4-byte boundaries, so the two 
low-order bits of the address are used to indicate the span 
(308) of the translation in terms of two byte units. That is, 
since the source instructions of this embodiment can be of 
length 2, 4, or 6, the span can indicate the scope of a 
translation in terms of the source storage occupied by the 
instruction translated. This is used when stores into the 
instruction stream arc detected in order to discover which 
translations are contaminated and must be invalidated. The 
first bit indicates whether or not the ITM entry represents 
two bytes that are part of an instruction starting before these 
two bytes, and the second bit specifies whether or not the 
next two bytes in source storage are part of the same 
instruction as these two bytes. This mechanism is general 
and allows specification for source instructions of any 
length. The values assumed in this embodiment are speci- 
fications of the assumed source and target processors and are 
not essential to the invention. Persons skilled in the art can 
easily design an ITM and addressing relationship to fit any 
variety of source instruction length and target branch 
address specification. 

FIG. 4 illustrates the relationship between an ITM virtual 
section and a backing real ITM cache section. Generally, 
storage management efficiency would tend to cause an ITM 
section to be designed as a target page in size in a particular 
embodiment. In this embodiment that is 4K bytes. However, 
this is not key to the invention. As indicated, target Dynamic 
Address Translation (DAT) forms the link between a virtual 
ITM section and its real ITM backing section in real storage. 
The Target Virtual ITM Entry Address (303) is as described 
for FIG. 3. Target DAT results in the target real ITM Cache 
Entry Address (402) which accesses the ITM in target real 
storage. Box 305 represents the virtual storage of the virtual 
ITM, while box 404 represents the entire real storage 
allocation for the real ITM cache. Fields 307 and 308 
represent the Translation pointer and Span in a virtual ITM 
entry, while 407 and 408 represent the same fields in the real 
ITM backing store. These arc the very same fields with 307 
and 308 in virtual storage, and 407 and 408 in real storage. 
The fields are 64 bits wide with two bits containing the 
instruction span information, as explained earlier. 

FIG. 5 illustrates how the address of an existing backing 
real ITM section can be found, if one exists, when a page 
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fault occurs on a reference to an entry in a virtual ITM 
section. The process is described in FIG. 11. When such a 
page fault occurs, source DAT is performed on the ASH) and 
instruction address (shown in FIG. 3) of the instruction to be 
performed in order to find its address in source real storage 5 
(504). In this embodiment, an ITM entry is 4 times as large 
as the instruction unit it represents. Therefore, if an ITM 
section sire is chosen to be a page, the source storage section 
it represents is one-quarter of a page in size. Thus, to find an 
ITM section representing a particular instruction section it is 10 
necessary to include the two high-order bits of the within- 
a-page address of an instruction address, shown in the 
Source Real Storage Section Address (501) in FIG. 5 as XX, 
along with the Page-frame-real-address (PFRA). There is a 
Real ITM Directory (505) which indicates, for each Source 15 
Real Storage section, whether an ITM real section (509) 
exists in the Real ITM Cache or not, and, if so, what its 
address is. The Real ITM Directory Entry address (502) is 
calculated as shown by the relative source real storage 
section number (the PFRA extended on the right by the two 20 
high-order bits of the relative-within-a-page address, which 
are marked XX). In the general case, division is used, but in 
this embodiment shifting provides an efficient means for 
doing the division. In this embodiment, for example, an ITM 
section is 4K bytes, while a Real Storage section is IK bytes. 25 
Thus, an ITM section represents the instructions in a quarter- 
page. To locate the proper ITM section for a source instruc- 
tion, the relative address must include the two bits that 
specify which quarter of a page an instruction is in. This 
relative address is shifted to properly address the Directory 30 
entries as 4-byte entities. The base E is the starting point of 
the Directory in target storage, and is preappended to the 
relative Directory entry address to access it in target storage. 
As shown, the Real ITM Directory entry (508) contains the 
relative address of the Real ITM section for that source 35 
section, if one exists. It is combined with the base C to form 
the target machine real address (503) of the ITM section in 
the real ITM cache. It will be used to back all virtual ITM 
sections that represent those instructions. In other words, the 
same Real ITM section will back all virtual ITM sections for 40 
all address spaces that are in fact the same instructions. 
During emulation, this permits a set of the exact same 
instruction images which are a part of more than one 
instruction address space to use the same instruction trans- 
lations. In S/390 systems, such storage areas, mapped into 45 
multiple address spaces, are called "Common" areas. For 
practicality, these areas are defined in units of the processor 
paging storage unit, i.e., they occupy an integral number of 
pages at page boundaries. In S/390, the same relative virtual 
pages of Common areas in all ASID's are backed in real 5 q 
storage by the very same pages since their content is, by 
definition, exactly the same. However, the exploded target 
virtual addresses of the areas are different in order to detect 
the specific ASID of reference in order to check the validity 
of the reference (see application Ser. No. 08/349,771, which 55 
is assumed by this invention). Since these different virtual 
addresses are really the "names" of the very same informa- 
tion, the same Real ITM section can back the different 
virtual ITM sections for them. If the Real ITM Directory 
indicates that there is no assigned Real ITM section in the 60 
Real ITM Cache, the emulator must newly assign a free page 
from the Real ITM Cache area to back the virtual ITM that 
is being processed, and indicate this in the Real ITM 
Directory Entry for the Source Real Storage section. This is 
described in FIG. 12. ^ 

FIG. 6 shows the relationship between source real storage 
and the target virtual ITR sections that contain the target 
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translations of all source instructions in the source real 
storage sections. As shown on the left, a source page (607), 
addressed by a source exploded virtual address (601) and 
mapped into target exploded virtual storage as a contiguous 
page, is actually four contiguous storage sections. That is 
because, in this embodiment, an ITM entry is four times as 
wide as a source minimal logical instruction. To make an 
ITM section one target page, the source storage section 
covered by that ITM is one-quarter of a page. Performing 
source machine DAT on the source address portion of the 
source exploded virtual section address locates the source 
real section in real page 608 in source real storage. Since 
relocation occurs at the page level, the page is contiguous in 
target real storage within the portion reserved to represent 
source real storage. The source real storage section address 
in target real storage is obtained by pre-appending base D to 
the beginning of it. Each storage section of each source real 
page will have its source-to-target instruction translations 
placed into a different Instruction Translation Region (ITR) 
section (609), as indicated by the dotted line between the 
two rightmost rectangles in tie figure. In this embodiment, 
the size of an ITR section has been specified as one 
megabyte, though that is not central to the invention. An ITR 
section should be large enough to be expected to hold all of 
the translations of a source real storage section in order to 
avoid the complications of storage management that would 
otherwise be required. Given a source real storage section 
address (501), shown in the top portion of the figure, the 
virtual address (603) of the corresponding ITR section can 
be calculated. However, ITR sections are backed in target 
real storage only as required to hold the translations of 
source instructions, and this assignment can occur by 
demand paging as required. Thus, the direct relationship of 
a virtual ITR section with each section of a source program 
that may contain instructions is not costly in terms of real 
resources required by such a virtual storage assignment. 
That is because target DAT is used for the translation of 
virtual ITR references to target real addresses, by executing 
the process in target machine DAT-on mode. The address 
translation indicated between the second and third address 
types illustrated at the top of the figure uses the base F to 
properly place the ITR address into the assigned part of 
target virtual reserved for ITR sections. The page number is 
shifted to accomodate the expansion of ITR required beyond 
the size of the storage section (to one megabyte multiples, 
one for each source storage section), and the two high-order 
bits of the within-page address are used to address the proper 
storage section within the page. Generally, this transforma- 
tion is a multiplication operation, but in this specific embodi- 
ment shifting is used as an efficient multiplication means. 

FIG. 7 shows the relationship between ITM entries in 
ITM section 509 in the Real ITM Cache (404) and areas of 
the single ITR section (609) assigned to hold instruction 
translations of source instructions in the real storage section 
whose translation state is indicated by the ITM entries (eight 
bytes wide). The real ITM cache section address is illus- 
trated by 503. The virtual ITR section address is depicted by 
603.. The ITM page is shown at the bottom of the figure on 
the left. The virtual ITR section 609 within the virtual ITR 
region 606 addressed by these entries is shown on the 
bottom to the right. As shown, some entries address instruc- 
tion translations. Others address the address of the Transla- 
tor (or 1ST) found in each ITR header. The latter is true when 
no translation yet exists for a source instruction unit (two 
bytes in this embodiment) but an ITM page does exist in the 
Real ITM Cache (404) for the source section containing an 
instruction unit. The only address relationship between a 
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Real ITM Cache section and a Virtual ITR section is the 
direct address pointers within the ITM entries themselves. 

An ITM section represents the instructions potentially in 
a target virtual exploded storage section. Many such sections 
can be represented by the same ITM section in real storage 
because they represent the same source byte images in 
source real storage. When it is found that a virtual ITM 
section represents a section of source real storage for which 
there is an already existing ITM section, the virtual ITM 
section for it is backed by the same existing ITM section in 
real storage. As shown in FIG. 5, a Real ITM Directory 
indicates which source sections have existing real ITM 
sections for them. Since two different target exploded virtual 
addresses can be backed by the same real target page frame, 
it is possible that two different target PTE's may define 
different virtual addresses backed by the same page frame. 
FIG. 8 indicates how this is kept track of in order to allow 
disallocation of an ITM page frame when source program 
execution makes that necessary. If an ITM page is to be 
invalidated, it is necessary to clear all PTE's denning target 
virtual addresses backed by the page in order to ensure 
correct operation. A Real ITM Cache Frame-to-PTE-List 
Directory, which has one entry per Real ITM page frame, 
contains the heads of pointer lists associating PTE entries 
addressing ITM entries backed by the same real page frame 
in the Real ITM Cache. A Single-Frame-PTE list entry is 
denned, one for each PTE entry, through which a list of 
associated PTE's can chain. In the figure, pointer 802, in the 
real ITM cache frame-to-PTE list directory entry associated 
with real ITM page 801 in the real ITM cache (810) 
addresses the 803 entry in the PTE list (which is associated 
with PTE 805). It in turn addresses the 804 entry in the list 
(associated with PTE 806), which indicates the end of the 
list. This list is a consequence of having two PTE's, 805 and 
806, in the Target Page Table 809 defining virtual ITM 
addresses VA1 and VA2, both backed by the same real ITM 
page 801. Should it be necessary to invalidate the ITM 
backed by page 801, the list is used to invalidate all PTE's 
that address it. The general mechanism is described more 
fully in application Ser. No. 08/349,771, which is assumed 
by this invention. 

FIG. 9 illustrates the target machine process for executing 
a source instruction. At step 901, the source machine instruc- 
tion address is preappended with the Address Space Identi- 
fication (ASID) of the executing program and a base quan- 
tity offset for the beginning of the Instruction Translation 
Map (ITM) in target virtual storage, and shifted to accomo- 
date for the width of an ITM entry (see FIG. 3) and used to 
address the ITM entry for the source machine instruction to 
be executed. At step 902 the addressed ITM entry is loaded 
into a target machine working register. If a page fault results 
because of this fetch (903), the ITM is not backed in target 
real storage. The page fault handling is shown on FIG. 11. 
If there is no page fault (and also following resolution of an 
ITM page fault event) the ITM entry is tested to find out if 55 
there is an existing target instruction translation for the 
current source instruction in step 904. If not, a translation 
miss is said to have occurred. The translation miss process- 
ing is shown on FIG. 10. If an instruction translation exists, 
the target instructions providing the effect of the source 
instructions being executed arc themselves executed at step 
905. The source next-instruction address is then calculated at 
step 906 and control transfers to step 901 to perform the 
execution process for the next set of source instructions. 

FIG. 10 illustrates the processing performed when a 
translation miss occurs. That is, there is no existing target 
machine instruction translation for the source machine 
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instruction to be executed. Step 1001 determines the target 
machine virtual address of the Instruction Translation 
Region (ITR) section that will hold the instructions of the 
translation when they are produced. It is uniquely identified 
by the pointer to the translator entry in the ITR section, 
which pointer is found in the ITM entry which caused the 
translation miss, and was used to access this logic. (ITR 
sections are backed by target storage when accessed during 
processing but pages of ITR sections are pageable by the 
target real storage manager. Page faults on these pages are 
resolved by normal prior-art methods without special pro- 
cessing so the processing of such page faults is not shown 
here). The addressing relationship is depicted in FIG. 5. Step 
1002 calls the instruction translator (1ST) to create the set of 
. target instructions that will produce exactly the same effect 
in execution as the source instruction being executed, and to 
place them in the ITR section. On return from the translator, 
the address in the ITR of the newly created instruction 
translation is placed into the ITM entry for the source 
instruction, at step 1004. For a source instruction which 
spans multiple instruction units, only the first ITM entry 
associated with the instruction is set to point to the transla- 
tion. The instruction translation supplied by the translator 
will be used repeatedly each time the same source instruc- 
tion is subsequently to be executed, assuming there has been 
no change to the source instruction in the meantime. The 
ITM indicates the location of the existing instruction trans- 
lation allowing it to be directly invoked without further 
action by the translator function. Step 1005 indicates the 
span of the translation in the ITM entry to indicate the 
number of source bytes in the instruction stream covered by 
the translation in the ITR to which the ITM now points. This 
is done by marking each ITM entry for instruction units that 
are pan of this same multi-unit source instruction as being 
associated with this source instruction as described in FIG. 
3. This can be used on stores into the instruction stream to 
indicate whether a translation has been invalidated by a store 
operation in the source program into an area that currently 
or previously contained instructions that were executed. The 
instruction translation is executed at step 1006, the next 
source machine instruction address is calculated at step 
1007, and control transfers to the top of FIG. 9, at step 901, 
to perform the execution process for the next set of source 
instructions. 

FIG. 11 shows the processing performed when a page 
fault occurs on the reference to an ITM entry in the pro- 
cessing depicted in FIG. 9. The base portion of the extended 
target virtual machine address that caused the page fault is 
extracted at step 1101. This is tested at step 1102 to check for 
an ITM page fault. If the base address portion is not that for 
the ITM table area in target virtual storage, some other type 
page fault has occurred and transfer is made to the real 
storage manager for routine page fault processing of the 
event. This is not shown here since many methods are 
previously described in prior art for demand paging of 
virtual storage. If step 1102 determines that the page fault 
was on an access to an ITM entry, step 1103 extracts the 
source ASID and source address, which indicate the ITM 
entry from the source exploded virtual address of the source 
instruction whose ITM entry is sought. For example, in 
S/390 ESA this would include the ASID and the virtual 
address of the instruction within that address space. Using 
the source machine Dynamic Address Translation (DAT) 
method, the source real address assigned to that source 
virtual address by the source operating system is calculated 
at step 1104. Using the source real address, the address of the 
real ITM directory entry for the ITM section for the source 
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real storage page is calculated in step 1105. There is a direct 
addressing relationship between source real storage pages 
and that directory of ITM sections in target real storage. The 
addressed directory entry is tested at step 1106 to ascertain 
whether or not an ITM section exists in real storage for that 
source real page. If not (the entry is NULL), control passes 
to FIG. 12 to handle the ITM miss condition. The initial state 
of the real ITM directory has all entries set to NULL. If the 
ITM section exists, a target machine Page Table Entry (PTE) 
is selected and allocated for this purpose at step 1107, and 
the contents of the PTE are constructed to address the 
required page of ITM entries at step 1108. At step 1109 store 
protection of the source page is established to allow detec- 
tion by the target machine of store operations by target 
instructions of source machine translations into the page of 
source instructions represented by the ITM being established 
in the PTE. The PTE is made part of the single-frame PTE 
list for the target real page to which the PTE resolves 
references through it, at step 1110. The structure of the 
single-frame PTE list is illustrated in FIG. 8. Since multiple 
target virtual addresses of ITM section pages can resolve to 
the same ITM target real page frame address, a mechanism 
is needed to indicate all PTEs resolving to a single frame so 
that, if a frame is deallocated and reassigned to another use, 
all PTEs resolving to it can be invalidated. The single-frame 
PTE list is utilized to provide this function by relating all 
PTEs resolving to the same page frame real storage address. 
Control returns to step 904 of FIG. 9. 

FIG. 12 depicts the process for an ITM miss situation. If 
the ITM directory entry for an ITM section is NULL, real 
storage for an ITM cache section must be allocated. The 
storage is selected and allocated for the ITM section at step 
1201 from the pool of target real pages assigned for this 
purpose. At step 1202 the target machine virtual address of 
the ITR section that is to hold the translations of source 
instructions from the source page represented by the ITM 
section being allocated in target real storage is computed. As 
explained earlier (see FIG. 5), there is a direct addressing 
relationship between a source real page and a virtual ITR 
section whose address can be used to access the translations 
of source instructions in that page. 

The source real page address computed in step 1104 of 
FIG. 11 is used to determine the virtual ITR section address. 

Step 1203 calls a function that initializes the header of an 
ITR section in preparation for its receiving instruction 
translations. Step 1204 initializes the entries in the ITM 
section to transfer to the translator function, which will 
occur at source instruction execution until an entry is 
replaced by the target virtual address of a translation in the 
ITR section, by action of the translator, the first time the 
source instruction is encountered for execution. The format 
of an entry in the ITM section is depicted in FIG. 5. Step 
1205 sets the ITM directory entry for the real storage section 
(from 1104) to address the newly allocated real ITM cache 
section. An entry in the real ITM directory is illustrated in 
FIG. 7. All source exploded virtual storage pages repre- 
sented by the newly-backed ITM section are store-protected 
in order to provide notification of any store that should cause 
invalidation of an existing source instruction translation, as 
was described in the Summary. This is done at step 1206. 
Return is then made to step 1107 on FIG. 11 to continue ITM 
page fault processing. 

FIG. 13 indicates the processing that occurs when a store 
protection exception is encountered during execution of the 
instructions of translations of source instructions, i.e., the 
source program is modifying an area that may contain 
instructions that may have already been translated to target 
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instructions. Step 1301 calculates the source real address of 
the target virtual address that resulted in the protection 
exception. At step 1302, the source real address of the 
reference is used to determine the real ITM directory entry 
for the ITM for instructions in the source page of the 
reference. Step 1303 tests that real ITM directory entry. If it 
is NULL, no instruction translations exist for that source 
storage section. The protection violation may be a source 
architecture violation. The rules of the source architecture 
are used to check for such a violation at step 1304. If there 
has been such a violation, it is reported to the source 
operating system under the source architecture rules for 
handling such notification within the source operating envi- 
ronment. If not, the store protection is removed at step 1305, 
the store is performed in the source storage within target 
storage at step 1306, and return is made to the instruction 
following that which caused the exception in the target 
translation of the subject instruction specifying the store 
operation. If the real ITM directory entry was found not to 
be NULL at step 1303, a check is made using source 
architecture rules to ascertain whether or not a source 
protection exception occurred at step 1307. If so, this is 
communicated to the source operating system in accordance 
with the source rules for source communication of such an 
event in the source machine. If it is not a source program 
violation, then step 1308 tests for a translation corruption, 
i.e. whether or not the store changes a source location for 
which there exists a current target machine translation. The 
storage extent of the store operation is calculated and that 
extent is checked in the corresponding ITM section to see 
whether the change is affecting any part of the span of any 
translated source instruction(s). If not, the store is performed 
at step 1309 in the source storage as mapped in target 
storage, and control returns to the next target instruction to 
be executed within a source instruction translation. If there 
has been an instruction translation corruption as a result of 
the store operation in source machine storage, the translation 
management function is called to invalidate those transla- 
tions affected, at step 1310. Control then passes to step 1309 
to perform the specified store operation in source machine 
real storage. 

Finally, it is necessary to extend the PTE allocation 
process described in FIG. 27 of application Ser. No. 08/349, 
771, which is assumed by this invention. When a new PTE 
is allocated to back a target exploded virtual address with a 
target real page frame which represents a source real page, 
the real ITM directory described in this invention must be 
consulted to determine if the real page has an real ITM cache 
section allocated to it. If so, the PTE must be store protected 
to trap stores into a page which contains source instructions. 

While the invention has been described in detail herein in 
accordance with certain embodiments thereof, many modi- 
fications and changes therein may be effected by those 
skilled in the art. Accordingly, it is intended by the appended 
claims to cover all such modifications and changes as fall 
within the true spirit and scope of the invention. 

Having thus described our invention, what we claim as 
new and desire to secure by Letters patent is: 

1. An emulation method for executing individual source 
instructions in a target processor to execute source programs 
requiring source processor features not built into the target 
processor, comprising the steps of: 

inputting instructions of a source processor program to an 
emulation target processor having significant excess 
virtual addressing capacity compared to a virtual 
addressing capacity required for a source processor to 
natively execute the source processor program, and 
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supporting multiple source virtual address spaces in the 
operation of the source processor, 

building a virtual ITM (instruction translation map) in a 
target virtual address space supported by the target 
processor, the virtual ITM containing an ITM entry for 5 
each source instruction addressable unit, each source 
instruction addressable unit beginning on a source 
storage instruction boundary, structuring each ITM 
entry for containing a translation address to a target 
translation program that executes a source instruction 10 
having a source address associated with the ITM entry, 

deterrnining a ratio R by dividing the length of each ITM 
entry by the length of each source instruction addres- 
sable unit, 

accessing an ITM entry for an executing source instruc- 15 
tion by: 

generating a source aggregate virtual address for the 
source instruction by combining the source address 
of the source instruction with a source address space 
identifier of a source virtual address space containing 2 o 
the instruction, 

multiplying the source aggregate virtual address by R 
to obtain a target virtual address component, and 

inserting the target virtual address component into a 
predetermined component location in a target virtual 25 
address to generate an ITM entry target virtual 
address for locating an ITM entry associated with the 
source instruction in order to obtain a one-to-one 
addressing relationship between ITM entry target 
virtual addresses and source instruction addresses. 30 

2. A source emulation system for executing source 
instructions in a target processor as defined in claim 1, 
further comprising the step of: 

structuring each ITM entry with at least two fields: a 
pointer field for containing a target instruction transla- 35 
tion address, and a span field for containing a span 
indication that indicates if the ITM entry is part of a set 
of ITM entries associated with the same source instruc- 
tion for handling source variable length instructions. 

3. A source emulation system for executing source 40 
instructions in a target processor as defined in claim 2, 
further comprising the step of: 

using the source real address of a source instruction 
represented by an ITM entry to determine if an ITM 
section exists for the virtual ITM section containing the 45 
entry, and 

translating an ITM entry target virtual address to a back- 
ing ITM entry target real address of a backing ITM 
entry in an assigned target page frame (PI 7 ) in real 
storage of a target processor, the target PF containing a 50 
subset of backing ITM entries for backing a section of 
the virtual ITM in target real storage, each backing ITM 
entry in the target PF supporting the pointer field and 
the span field in a corresponding virtual ITM entry. 

4. A source emulation system for executing source 55 
instructions in a target processor as defined in claim 3, 
further comprising the step of: 

signalling a page fault if the ITM section target virtual 
address has no backing target PF, 

60 

using source DAT to determine the source real address of 
the currently executing instruction, 

using the source real address of the instruction to index to 
an entry in the real ITM directory to determine if a real 
ITM cache section is already defined for the source real 65 
storage section which includes the currently executing 
source instruction, 



allocating a target PF from the real ITM cache if one does 
not exist, 

initializing the target PF when it is assigned to back the 
ITM target virtual address by setting each pointer field 
in the target PF to a value interpreted as a translator 
address for locating, a translator program in the target 
virtual storage for generating target translations which 
are target programs that execute source instructions, 
and 

indicating the target PF in the real ITM directory entry 
associated with the source real storage section contain- 
ing the currently executing source instruction. 

5. A source emulation system for executing source 
instructions in a target processor as defined in claim 4, 
further comprising the step of: 

signalling a page fault if the ITM section target virtual 

address has no backing target PF, 
using source DAT to determine the source real address of 

the currently executing instruction, 
using the source real address of the instruction to index to 

an entry in the real ITM directory to determine if a real 

ITM cache section is already defined for the source real 

storage section which includes the currently executing 

source instruction, and 
backing the virtual section ITM section for the currently 

executing source instruction with the same real ITM 

cache section if one exists. 

6. A source emulation system for executing source 
instructions in a target processor as defined in claim 5, 
further comprising the steps of: 

accessing the pointer field of the backing ITM entry for 
obtaining a translator address to enter the translator 
program, 

assigning a virtual instruction translation region (virtual 
ITR) in target virtual storage for locating the translator 
program and instruction translations of previously 
executed source instructions, 

executing the translator program for generating a target 
translation for a currently executing source instruction 
when the backing ITM entry does not contain a pointer 
to an instruction translation, 

storing the target translation in a page within a virtual ITR 
section, associated with the source section containing 
the instruction and located by a translation target virtual 
address, 

writing the translation target virtual address in the pointer 
field of the backing ITM entry for the executing source 
instruction to overlay the translator address to enable a 
direct access of the target translation from the backing 
ITM entry for enabling a later execution of the source 
instruction to bypass the translator program, and 

setting the span field in the backing ITM entry to a span 
indication indicating if the ITM entry is part of a set of 
consecutive backing ITM entries associated with the 
executing source instruction, and setting the span field 
in each other backing ITM entry in the set to a span 
indication that indicates which backing ITM entries are 
part of the consecutive set. 

7. A source emulation system for an executing source 
instruction in a target processor as defined in claim 6, further 
comprising the steps of: 

accessing the translation target virtual address in the 
pointer field in the backing ITM entry when subse- 
quently executing a previously executed source instruc- 
tion associated with the ITM entry, 
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performing dynamic address translation (DAT) on the 
translation target virtual address to locate a backing 
target PF in target real storage containing at least the 
beginning of the target translation for the previously 
executed source instruction, and 5 

executing target instructions of the target translation in the 
target PF to emulate execution of the executing source 
instruction. 

8. A source emulation system for executing source 
instructions in a target processor as defined in claim 7, 10 
further comprising the step of: 

signalling an ITR page fault if the translation target virtual 
address has no backing target PF for the ITR page. 

9. An emulation system for an executing source instruc- 
tions in a target processor as defined in claim 2, further 15 
comprising the step of: 

accessing the backing ITM entry in the backing section 
target PF at the target real address associated with the 
source instruction located at a different source virtual 
address location by using a source DAT process, 
wherein the backing section of the target PF is used to 
back multiple copies of the same instruction in different 
source virtual locations to avoid having multiple copies 
of the backing section in target real storage for multiple 25 
source copies of the same source instruction. 

10. A source emulation system for executing source 
instructions in a target processor as defined in claim 6, 
further comprising the steps of: 

determining if an existing ITR virtual pointer in an 30 
existing real ITM entry in a backing PF locates an 
existing instruction translation previously determined 
for the same source instruction as the currently execut- 
ing source instruction but at a different source virtual 
address, and 35 

backing the ITM section for the currently executing 
source instruction by the same PF, and the same 
instruction translation being used for multiple source 
virtual addresses containing the same source instruc- 
tion. 40 

11. A source emulation system for executing source 
instructions in a target processor as defined in claim 7, 
further comprising the steps of: 



protecting target page frames (PFs) in target storage 
which contain previously translated source instructions 
from being stored into by the target processor, 

detecting each source real storage address of each source 
instruction subject to a source store operation, 

determining all target virtual addresses of all ITM entries 
associated with previously translated source instruc- 
tions detected as being stored into, 

using the span field of the ITM entry associated with the 
first source byte store into to determine the first source 
byte of the first source instruction unit affected by a 
store operation, 

setting the ITM entry for that first affected source instruc- 
tion to an untranslated state, 

setting to an untranslated state the pointer field in a first 
ITM entry associated with the first instruction unit of 
each other translated instruction affected by a store 
operation, 

also setting the span field to an uri associated state in all 
ITM entries associated with each source instruction 
stored into, the settings in the pointer and span fields of 
all ITM entries associated with all stored-into source 
instructions being set to indicate no association with 
any source instruction, and 

then performing corresponding store operations in target 
real storage for the same instructions. 

12. A source emulation system for executing source 
instructions in a target processor as defined in claim 7, 
further comprising the step of: 

maintaining a fixed allocation relationship of ITR virtual 
sections to source real storage sections, using the fixed 
allocation relationship to directly access an ITR virtual 
section given a source real storage section address, and 
using the real ITR pages as needed to back virtual ITR 
pages to hold instruction translations for efficiently 
using real storage, while using target virtual storage to 
provide direct addressing access of the ITR pages. 



12/02/2003, EAST Version: 1.4.1 



UNITED STATES PATENT AND TRADEMARK OFFICE 

CERTIFICATE OF CORRECTION 



PATENT NO. : 5,560,013 
DATED : Sep. 24, 1996 

INVENTOR(S): Casper A. Scalzi et aL 

It is certified that error appears in the above-indentified patent and that said Letters Patent is hereby 
corrected as shown below: 



Column 6, line 44 "2**(46HS536" should be -2**16 = 65536-. 



Attest: 



Signed and Sealed this 
Seventh Day of January, 1997 



BRUCE LEHMAN 

Attesting Officer Commissioner of Patents and Trademarks 



12/02/2003, EAST version: 1.4.1 



Illl 



United States Patent [19] 

Subramanian 



US006026240A 
[ii] Patent Number: 
[45] Date of Patent: 



HI 



6,026,240 
Feb. 15, 2000 



[54] METHOD AND APPARATUS FOR 
OPTIMIZING PROGRAM LOOPS 
CONTAINING OMEGA-INVARIANT 
STATEMENTS 

[75] Inventor: Krishna Subramanian, Mountain 
View, Calif. 

[73] Assignee: Sun Microsystems, Inc., Palo Alto, 
Calif. 

[21] Appl. No.: 08/609,035 
[22] Filed: Feb. 29, 1996 

[51] Int. CI. 7 G06F 9/45 

[52] U.S. CI 395/709; 395/705 

[58] Field of Search 395/702, 705, 

395/709 

[56] References Cited 

U.S. PATENT DOCUMENTS 

5,202,995 4/1993 O'Brien 395/709 

5,361,354 11/1994 Greyzck 395/705 

5,457,799 10/1995 Srivastava 395/705 

5,596,732 1/1997 Hosoi 395/705 



OTHER PUBLICATIONS 

Zima, Hans, and Barbara Chapman, Supercompilers for 
Parallel and Vector Computers, New York, NY, ACM Press 
pp. 112-172, 1991. 

Aho, Alfred V, et al., Compilers: Principles, Techniques and 
Tools. Reading, MA: Addison Wesley (1986), pp. 638-642. 

Primary Examiner— Alvin E. Oberley 

Assistant Examiner — Peter Stecher 

Attorney, Agent, or Firm — Beyer & Weaver, LLP 



[57] 



ABSTRACT 



Apparatus, methods, and computer program products are 
disclosed for optimizing programs containing single basic 
block natural loops with a determinable number of itera- 
tions. The invention optimizes, for execution speed, such 
program loops containing statements that are initially 
variant, but stabilize and become invariant after some num- 
ber of iterations of the loop. The invention optimizes the 
loop by unwinding iterations from the loop for which the 
statements are variant, and by hoisting the stabilized state- 
ment from subsequent iterations of the loop. 

10 Claims, 9 Drawing Sheets 
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METHOD AND APPARATUS FOR 
OPTIMIZING PROGRAM LOOPS 
CONTAINING OMEGA-INVARIANT 
STATEMENTS 

FIELD OF THE INVENTION 

This invention relates to the field of Optimizing Compil- 
ers for computer systems. Specifically, this invention is a 
new and useful optimization method and apparatus for 
optimizing the order of computer operation codes resulting 
from the compilation of a program loop. 

BACKGROUND 

Early computers were programmed by rewiling them. 
Modern computers are programmed by arranging a sequence 
of bits in the computer's memory. These bits perform a 
similar (but much more useful) function as the wiring in 
early computers. Thus, a modern computer operates accord- 
ing to the binary instructions resident in the computer's 
memory. These binary instructions are termed operation 
codes (opcodes). The computer fetches an opcode from the 
memory location pointed to by a program counter. The 
computer's central processor unit (CPU) evaluates the 
opcode and performs the particular operation associated 
with that opcode. Directly loading binary values in memory 
to program a computer is both time consuming and mind 
numbing. Programming languages simplify this problem by 
enabling a programmer to use a symbolic textual represen- 
tation (the source code) of the operations that the computer 
is to perform. This symbolic representation is converted into 
binary opcodes by compilers or assemblers. By processing 
the source code, compilers and assemblers create an object 
file containing the opcodes corresponding to the source 
code. This object file, when linked to others, results in 
executable instructions that can be loaded into a computer's 
memory and executed by the computer. 

A source program consists of an ordered grouping of 
strings (statements) that are converted into a binary repre- 
sentation (including both opcodes and data) suitable for 
execution by a target computer architecture. A source pro- 
gram provides a symbolic description of the operations that 
a computer will perform when executing the binary instruc- 
tions resulting from compilation and linking of the source. 
The conversion from source to binary is performed accord- 
ing to the grammatical and syntactical rules of the program- 
ming language used to write the source. This conversion 
from source to binary is performed by both compilers and 
assemblers. 

One significant difference between assemblers and com- 
pilers is that assemblers translate source code statements 
into binary opcodes in a one-to-one fashion (although some 
"macro" capabilities are often provided). On the other hand, 
compilers transform source code statements into sequences 
of binary opcodes that, when executed in a computer, 
perform the operation described by the source. The symbolic 
statements processed by a compiler are more general than 
those processed by an assembler and each compiled state- 
ment can produce a multitude of opcodes that, when 
executed by a computer, implement the operation described 
by the symbolic statement. Unlike an assembler, that main- 
tains the essential structural organization of the source code 
when producing binary opcode sequences, a compiler may 
significantly change the structural organization represented 
by the source when producing the compiled binary. 
However, no matter how much the compiler changes this 
organization, the compiler is restricted in that the compiled 
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binary, when executed by a computer, must provide the same 
result as the programmer described using the source 
language — regardless of how this result is obtained. 

Many modern compilers can optimize the binary opcodes 
5 resulting from the compilation process. Due to the design of 
programming languages, a compiler can determine struc- 
tural information about the program being compiled. This 
information can be used by the compiler to generate different 
versions of the sequence of opcodes that perform the same 

10 operation. (For example, enabling debugging capability, or 
optimizing instructions dependant on what version of the 
target processor the source code is compiled for.) Some 
optimizations minimize the amount of memory required to 
hold the instructions, other optimizations reduce the time 

15 required to execute the instructions. The invention disclosed 
herein optimizes so as to maximize execution speed for a 
particular type of loop operation. 

Some advantages of optimization are that the optimizing 
compiler frees the programmer from the time consuming 

20 task of manually tuning the source code. This increases 
programmer productivity. Optimizing compilers also 
encourage a programmer to write maintainable code because 
manual tuning often makes the source code less understand- 
able to other programmers. Finally, an optimizing compiler 

25 improves portability of code because source code tuned to 
one computer architecture may be inefficient on another 
computer architecture. 

Compilers generally have three segments: (1) a front-end 

30 that processes the syntax and semantics of the language and 
generates at least one version of an "intermediate" code 
representation of the source; (2) a back-end that converts the 
intermediate code representation into binary computer 
instructions (opcodes) for a particular computer architecture 

35 (i.e., SPARC, X86, IBM, etc.); and (3) various code opti- 
mization segments between the front- and back-ends of the 
compiler. These optimization segments operate on, and 
adjust, the intermediate code representation of the source. 
For loops, the intermediate code representation generally 

40 includes data structures that either represent, or can be used 
to create, data dependency graphs (DDGs). DDGs embody 
the information required for an optimizer lo determine 
which statements are dependent on other statements. The 
nodes in the graph represent statements in the loop and arcs 

45 represent the data dependencies between nodes. Data depen- 
dency graphs are described in chapter 4 of Superconpilers 
for Parallel and Vector Computers, by Hans Zima, ACM 
press, ISBN 0-201-17560-6, 1991. 

One example of a prior art optimization is for the compiler 

50 to process the source code as if the programmer had written 
the source in a more efficient manner. For example, common 
subexpression elimination replaces subexpressions that are 
used more than once with a temporary variable set to the 
subexpression's value. Thus: 

55 a=i*2+3; 
b=sqrt(i*2); 
compiles as if written as: 
temp=i*2; 

60 a=temp +3; 
b-sqrt(temp); 

Another optimization is by code motion. This optimiza- 
tion hoists, from out of the enclosing loop, expressions that 
are loop-invariant for each iteration. Thus: 
65 while (!feop(fp)) DoSomething (fp, x*5); 
compiles as if written as: 

temp»x*5; 
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while (!feof(fp)) DoSomething (fp, temp); 
Yet another optimization (trading memory for speed) is to 
expand the expressions contained in the loop so that less 

time is spent performing loop overhead operations. For for (i - 0; i < = 30; i++) { 

example: 5 y = x*5; //inst l 

x - 2; //inst 2 

a-0; } 



for (i=0; i<100; i++) a=a+i; 

can be compiled as if written as: Here, on the first iteration of the loop, y receives the value 

30 of (1*5=5), and x receives the value 2. During the second 
iteration, y receives the value (2*5=<10), and x again receives 
— — — ^— _________ ___ value 2. On the third iteration and continuing until the end 

* a ? . of the loop, y receives the same value (10). In this example, 

for (i~0; i < 100; i + - 5) { 4 , *? ' t flU . • • ! 

a _a + i- JX the second statement of the loop x»2; ' is not loop-invanant 

a - a + i + 3; 15 because the value of x that reaches "inst 1" in the first 

a *= a + i + 2; iteration is different than the value of x that reaches "instr 1" 

in subsequent iterations. Thus, the prior alt cannot hoist the 
operations representing this statement from the loop's ker- 

nel. The same applies to variable y. However, x does not 

20 change subsequent to the first iteration over the loop. Thus, 
Finally, the compiler could just unwind iterations from the x and y become invariant after the first iteration. Keeping the 
loop so that there would be no loop overheads. Thus, operations represented by these statements within the loop's 



a = a + i + 3; 
a - a + i + 4; 

} 



=0; 



kernel, after the first iteration, slows execution of the loop 
because each execution of an instruction takes time and 



l = a + 3 
a = a + 2; 
a = a + 3; 
a = a + 4; 
a = a + 5 

} 



for (i=l; i<6; i++) a=a+i; 25 computer resources. After the initial variant iterations of the 

can be compiled as if written as: loo P> tnese statements have no further utility. Thus the time 

spent processing these statements is wasted and the loop is 
correspondingly inefficient. 

Statements that have assignments that are variant in the 

a ■ 30 first omega iterations, but that become invariant after omega 

{ fl=flJ . v iterations are termed "omega-invariant". Omega-invariant 

statements "stabilize" after "omega" iterations. Hence, the 
value delivered by the statement stabilizes after omega 
iterations and becomes invariant for subsequent iterations. 

35 In the example above, the omega for "inst 1" is one because 

the value of y remains the same (10) after one iteration of the 

loop. The omega for "instr 2" is zero because it is assigned 

In the above examples, the plus "+" is used to indicate any the value of 2 in the first and all subsequent iterations of the 

language operator. Further, there are other optimizations that 1°°P- 

could be performed on each example. The above examples 40 ^ P rior art does not optimize this sequence of state- 
indicates the operation of each separate optimization. In an ™ us ' art c 1 ° m P ilers . & eDe ' ate . less effi ™ nt 
... .. t . . | . . . (slower) loops than compilers practicing the invention. The 
optimizing compiler, many optimizations are applied to best > . ' . r u , u f ,. r . 4 , & 4 . , r 
r . . ° r J r rr invention described herein optimizes the execution speed or 

optimize the resultant executable instructions. SBBN loops containing omega-invariant statements with a 

Of course, the original source code is not modified by the 45 determinable number of iterations, 

compiler, rather the compiler sufficiently understands the SUMMARY OF THE INVENTION 

program structure, as described by the source, to make these _ . , , 

t . ■ ,. f #u ... „ , . t . The present invention overcomes the disadvantages of the 

optimizations for the programmer without chancing the , *; , , , 4 . 

, , , m ... .. above described systems and has the specific utility, among 

programmers intended result. These optimizations result in JQ otherS) of providjng an economical( hign performance, 

the production of binary instructions that execute faster in adaptable system, apparatus, method, system and computer 

the target computer than the non-optimized instructions that program product for increasing the execution speed of target 

perform the same operations. programs by reducing the number of instructions repeatedly 

It is well understood in the art how to hoist loop-invariant executed during some programmed loops. One aspect of the 

operations out of a loop operation. A general discussion of 55 present invention identifies program loop statements that 

optimizing compilers and the related prior art techniques can contain omega-invariant statements and optimizes the 

be found in Compilers: Principles, Tecliniques and Tools by execution speed of these loops. 

Alfred V. Aho, Ravi Sethi and Jeffcry D. Ullman, Addison- . ,n another as P ect of ,hc i° v °ntion, a computer system is 

Wesley Publishing Co., 1988, ISBN 0-201-10088-6, here- du5C J oscd ^ has a P rocc f ' n S ™' and 

• ft au r\ • «• 4.-1 * * *■ -60 random access memory (RAM) coupled to the CPU, for use 

inafter Aho. Optimization ot loop-invanant computations is -i* . * . 

, , . A , T r in compihng a target program to run on a target computer 

described in Aho at pages 638-642. ... f e 4L . . , • ■ 

r ° architecture. In addition, this system has an optimizing 

However, it is not known to the art how to hoist certain compiler capable of identifying single -basic-block-natural 

statements that are almost loop-invariant. In particular, some i 00 ps with a determinable tripcount and containing an 

compiler source sequences are not initially loop invariant, 65 omega-invariant operation. Further, this compiler is capable 

but become invariant after a given number of iterations. For of detecting the contained omega-invariant statement and of 

example: determining the number of iterations required to stabilize the 
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omega -invariant statement. Finally, the compiler is capable 
of optimizing the loop for increased execution speed depen- 
dent upon the number of iterations required to stabilize the 
omega-invariant statement. 

In one aspect of the invention, an apparatus is disclosed 
using a computer and compiler system to optimize single- 
basic-block loops containing omega -invariant statements 
that stabilize after a determinable number of iterations. 

In another aspect of the invention a code optimizer for use 
in a compiler operating on a computer system is disclosed 
for detecting and optimizing single-basic-block loops con- 
taining omega-invariant statements. 

In yet another aspect of the invention, a method is 
disclosed for identifying and optimizing single-basic-block 
loops containing omega-invariant statements. 

Finally, another aspect of the invention discloses a com- 
puter program product to cause a computer to detect and 
optimize single-basic-block loops containing omega- 
invariant statements. 

DESCRIPTION OF THE DRAWINGS 

The objects, features and advantages of the system of the 
present invention will be apparent from the following 
description in which: 

FIG. 1 illustrates a portion of a computer, including a 
CPU and conventional memory in which the invention may 
be embodied; 

FIG. 2 illustrates a typical compiler showing the position 
of the code optimizer; 

FIG. 3 illustrates a large scale organization of a code 
optimizer; 

FIG. 4 illustrates a Data Dependency Graph constructed 
from a sample SBBN loop; 

FIG. 5a illustrates the general characteristics and organi- 
zation of computer instructions representing a non- 
optimized SBBN loop statement; 

FIG, 5b illustrates the organization of computer instruc- 
tions representing an SBBN loop optimized in accord with 
the invention having a compile -time determined tripcount; 

FIG. 5c illustrates the organization of computer instruc- 
tions representing an SBBN loop optimized in accord with 
the invention having an execute-time determined tripcount; 

FIG. 6 Illustrates the compiler process for detecting an 
omega-invariant statement in a SBBN loop; 

FIG. 7 Illustrates the general compiler process for opti- 
mizing execution of an omega- invariant statement within a 
SBBN loop using the invention; 

FIG. la Illustrates the compiler process for creating an 
optimized SBBN loop having a compile-time determined 
tripcount; and 

FIG. lb illustrates the compiler process for creating an 
optimized SBBN loop with an execute-time determined 
tripcount. 

NOTATIONS AND NOMENCLATURE 

The detailed descriptions that follow are presented largely 
in terms of procedures and symbolic representation of go 
operations on data bits within a computer memory. These 
procedural descriptions and representations are the means 
used by those skilled in the data processing arts to most 
effectively convey the substance of their work to others 
skilled in the art. 65 

A Data Dependency Graph (DDG) is a data structure in 
the computer memory that represents how statements within 
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a loop depend on other statements. These graphs include 
nodes that represent computer operations and arcs that 
represent dependencies between the nodes. These dependen- 
cies include flow dependencies, data dependencies and anti- 
dependencies. Data structures within the compiler that rep- 
resent data dependency graphs are used to determine loop- 
invariant and omega-invariant statements within a loop. 
These data structures are often represented by diagrams 
using circles for nodes corresponding to statements and arcs 
between the nodes representing dependencies. FIG. 4 illus- 
trates an example DDG and is described below. 

Instructions are the compiled binary operation codes 
(opcodes) for a target computer architecture that implement 
the operation described by a statement. Often one compiled 
statement will describe multiple operations and generate 
many computer instructions. 

A loop operation, when compiled and the resulting com- 
puter instructions executed on a computer, causes the com- 
puter to repeatedly execute the instructions enclosed within 
the loop. Each repetition of the enclosed instructions is a 
single iteration of the loop. 

An iterative construct is a series of operations that effec- 
tuate a loop operation defined by a loop statement and the 
statements it encloses. 

A loop is a programming language construct that 
describes an iterative process where statements within the 
body of the loop define operations that are repeatedly 
executed by a computer. In other words a compiled loop, 
when executed in a computer, causes the computer to 
repetitively iterate through the operations described by state- 
ments contained in the loop until some completion condition 
is satisfied. As such, loop statements represent an iterative 
construct that provides an iterative control process coupled 
to other statements contained within the body of the loop. 
Loops, as optimized by the invention, are limited to single- 
basic-block- natural-loops (SBBN loops) that have a deter- 
minable number of iterations (i.e., having a compile-time 
computable or known symbolic tripcount). That is, loops 
that do not contain any control flow structures, functions, 
procedures, or other constructs that change the flow of 
execution within the loop. Such loops have only one entry 
point, one exit point, and no branches within the loop. 
The compiled instructions that implement loops are orga- 
nized into a prologue that initializes the loop and a 
kernel that contains the opcodes representing the com- 
piled statements within the loop's body and the test for 
the completion condition on the loop. The flow of 
execution is such that the prologue instructions are 
executed before the kernel instructions. Thus loop- 
invariant operations within the loop's kernel can be 
moved (hoisted) to the loop's prologue with the same 
functional result as if they were left within the loop. 
Hoisting is accomplished by using standard code 
motion algorithms such as the one described in Section 
10.7 of Aho. 

A loop is unwound when the operations within the loop's 
kernel, for some number of initial iterations, are hoisted 
to the prologue. Thus, some loop iterations are peeled 
off the loop's kernel. The number of iterations peeled 
off of the loop is termed the peelfactor. For each peel, 
the kernel operations are duplicated. For loops that 
have a variable number of iterations (e.g., that the 
iterations are determined at execution-time and not at 
compile-time), two versions of the instructions that 
effectuate the loop are created. One version effectuating 
a loop where the omega-invariant operation remains 
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within the kernel and the other version including peeled The invention also relates to apparatus for performing 
iterations, an optimized omega-invariant operation and these operations. This apparatus may be specially con- 
the optimized kernel. If the number of iterations sped- structed for the required purposes or it may comprise a 
fied for a particular invocation of the loop is less then general purpose computer as selectively activated or recon- 
the number of iterations peeled from the loop, the 5 figured by a computer program stored in the memory of a 
original loop is selected. Otherwise, the optimized computer. The procedures presented herein are not inner- 
version of the loop, consisting of the peels, the hoisted ently related to a particular computer or other apparatus. In 
omega-invariant operation and the kernel, is executed. particular, various general purpose machines may be used 
The tripcount is the number of iterations required of the with programs written in accordance with the teachings 
loop at the time the loop is entered. If the tripcount is ]Q herein, or it may prove more convenient to construct more 
greater than the peelfactor, the modified iterative pro- specialized apparatus to perform the required method steps, 
cess (the peels, the hoisted omega-invariant operation The required structure for a variety of these machines will 
and the optimized kernel) will be selected, otherwise appear from the description below. 

the original unpeeled loop will execute. For loops that Finally, the invention may be embodied in a computer 

have a tripcount that can be determined at compile readable medium encoded with an optimizing compiler 

time, the compiler determines if and how much to peel 15 application program, 
the loop. 

A loop-invariant statement is a statement within a loop DESCRIPTION OF THE PREFERRED 

that does not depend on any other variant statement in the EMBODIMENT 

loop and generates the same result in each iteration of the Operating Environment 

loop. Prior art techniques exist to hoist compiled 20 The invention can be practiced with any programming 

instructions, corresponding to loop- invariant statements, to language that utilizes loop constructs. Some of these pro- 

the loop's prologue. gramming languages include, but are not limited to, 

An omega -invariant statement is a statement that is ini- FORTRAN, PASCAL, C, C++, ADA and compiled BASIC, 

tially variant for some omega iterations of a loop, but that Example loop constructs in C and C++ are the "for", the 

stabilizes and remains invariant after omega iterations of the 25 "do-while", and the "while" statements. The optimization 

loop. provided by the invention applies to single-basic-block - 

An operation is described by a statement and is repre- natural-loops having a determinable number of iterations, 

sented by the corresponding intermediate code. The back- Thus, these loops have either a tripcount determined at 

end portion of a compiler converts the operations described compile-time or a tlipcount determined at execute-time from 

by the intermediate code into sequences of executable 30 a known operation. These loops have only one entry point, 

instructions for a target computer architecture. These one exit point, and no branching of the flow of execution 

instructions, when executed on the target computer effectu- within the loop. 

ate the operation. The invention is used with a computer system. Some of 
A procedure is a self-consistent sequence of steps leading the elements of such a computer system are shown in FIG. 
to a desired result. These steps are those requiring physical 35 1, wherein a processor 1 is shown, having an Input/Output 
manipulation of physical quantities. Usually these quantities ("I/O") section 2, a central processing unit ("CPU") 3 and a 
take the form of electrical or magnetic signals capable of memory section 4. The I/O section 2 is connected to a 
being stored, transferred, combined, compared, and other- keyboard 5, a display unit 6, a disk storage unit 9 and a 
wise manipulated. These signals are referred to as bits, CD-ROM drive unit 7. The CD-ROM unit 7 can read a 
values, elements, symbols characters, terms, numbers, or the 40 CD-ROM medium 8 that typically contains programs 10 and 
like. It will be understood by those skilled in the art that all data. Such a computer system is capable of executing a 
of these and similar terms are associated with the appropri- compiler program that embodies the invention, 
ate physical quantities and are merely convenient labels FIG. 2 illustrates a typical optimizing compiler 20, corn- 
applied to these quantities. prising a front-end compiler 24, a code optimizer 26 and a 
Statements are the textual strings that have the syntax and 45 back-end code generator 28. The front-end 24 of a compiler 
semantics of a programming language. An example state- takes, as input, a program written in a source language 22 
ment is "while (TRUE) {} ;" an instruction implementing containing a series of statements, including loop statements, 
the operation described by this statement would be the and performs various lexical, syntactical and semantic 
binary opcode for the target computer that causes the com- analysis on these statements creating an intermediate code 
puter to repeatedly execute the same instruction. In some 50 32 representing the operation of the target program. This 
assembler language, this operation could be represented by intermediate code 32 is used as input to the code optimizer 
"A: BR A". 26 that attempts to adjust (improve) the intermediate code 
The manipulations performed by a computer in executing resulting in a faster-executing set of machine instructions 30. 
opcodes are often referred to in terms, such as adding or Some code optimizers 26 are trivial while others do a variety 
comparing, that are commonly associated with mental 55 of computations in an attempt to produce the most efficient 
operations performed by a human operator. No such capa- target program possible. Those of the latter type are called 
bility of a human operator is necessary in any of the "optimizing compilers" and include such code transforma- 
operations described herein that form part of the present tions as common sub-expression elimination, dead-code 
invention; the operations are machine operations. Useful elimination, hoisting loop-invariant statements from a loop's 
machines for performing the operations of the invention 60 kernel, renaming temporary variables and interchanging two 
include programmed general purpose digital computers or independent adjacent statements as well as performing reg- 
similar devices. In all cases the method of computation is ister allocation. The intermediate code is either adjusted or 
distinguished from the method of operation in operating a recreated during the optimization process. Finally, the 
computer. The present invention relates to method steps for adjusted intermediate code is passed to the back-end code 
operating a computer in processing electrical or other (e.g., 65 generator 28 where the operations described by this inter- 
mechanical, chemical) physical signals to generate other mediate code are converted to binary opcodes corresponding 
desired physical signals. to specific instructions for the target computer architecture. 
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FIG. 3 depicts a typical organization of an optimizing 
compiler 40. On entry of the intermediate code 42, a data 
dependency graph is constructed 44. At this stage the code 
transformations mentioned above take place 46. Next, if 
supported by the compiler, instruction scheduling or "pipe- 
lining" may take place 48, followed by "Register allocation" 
50. Finally, the adjusted intermediate code is sent 52 to the 
compiler back-end for conversion to the binary instructions 
of the target machine architecture. 

The invention can be practiced in other segments of a 
compiler. For example, the invention can be practiced during 
the semantic and lexical scanning phase of the compiler 
when generating the initial intermediate code representing 
the operations of the program. This is true so long as the 
invention has access to sufficient information to construct a 
data dependency graph for a program loop; can determine 
either the tripcount, or a variable containing the tripcount, of 
the program loop; and can determine the operations enclosed 
within the program loop. In particular, in one embodiment, 
the invention resides in the instruction scheduling or soft- 
ware pipelining module 48. However, another embodiment 
could effectively have the invention reside within other 
various optimization portions 46 of the compiler. In fact, the 
invention could be practiced in compilers that do not use the 
"front-end, optimizer, back-end" structure. 

The present invention increases the execution speed of 
target programs by reducing the number of instructions 
repeatedly executed during some programmed loops. It 
identifies program loop statements that contain omega- 
invariant statements and optimizes the execution speed of 
these loops. For example, the following SBBN loop (a "C" 
language "for" statement having a compile-time tripcount): 



x - 1; 

for (i = 0; i < . 
y - x * 5; 
a[i]»x; 
b[i] - y; 
x - 2; 

} 



■10; 



would be optimized by the invention as if written as: 



x - 1; 

y = x * 5; 
40] m x; 

b[0] o y; 

x = 2; 
y - x • 5; 
all] - x; 

b[i] - y; 
for (i - 2; i < - 10; { 
ali]-x; 
b[i] - r. 
} 



//peel 1 (iteration 0) 



//peel 2 (iteration 3) 



// iteration 2-9 



35 



45 



50 



60 



further, the following loop (containing an execute-time 
tripcount): 



65 



10 



x = 1; 

for (i = 0; i < « 
y - x • 5; 
a[i]-x; 
b[i] - y; 
x-2; 

} 



N; i++) { 



would be optimized by the invention as if written as: 



15 



20 



25 



30 



x - 1; 

if(N>2){ 
x - 1; 

y = x * 5; 

a[0] = x; 

b[0] - y; 

x = 2; 

y - x ■ 5; 

a[l]=x; 

b[l]-y; 
for (i = 2; i < = N; i++) { 

a[i] = x; 

Hi] - y; 

} else { 

for (i - 0; i < = N; i++) { 

y = x ' 5; 

a[i] - x; 

b[i] - y; 

x-2; 

} 

} 



//peel 1 (iteration 0) 



//peel 2 (iteration 1) 



// iteration 2-N 



// if N < -2 



Thus, in one aspect of the invention, single-basic-block- 
natural loops having a determinable number of iterations and 
containing omega-invariant operations are optimized during 
compilation, by unwinding omega iterations from the loop 
and then hoisting the stabilized omega-invariant operations 
from the kernel of the peeled loop. To determine which 
statements are omega -invariant, the compiler must first 
construct a data dependency graph. 

FIG. 4 illustrates a data dependency graph (DDG) 400 for 
a loop described as shown by the source code at label 401. 
As is well known in the art, this source is representative of 
a SBBN loop asdescribed using the "C" language "for" 
statement. Such a loop statement represents an iterative 
construct when complied into computer instructions. DDGs 
interrelate how statements in a loop depend on other state- 
ments. Nodes 402, 404, 406 in the graph represent the three 
statements enclosed in the loop, arcs 408, 410 represent the 
data dependencies between the nodes. Here both node A 402 
and node C 406 depend on node B 404. The numbers 412 
next to the arcs 408, 410 indicate the data dependency 
distance between the nodes. 1 Tius, arc 408 between node B 
404 and node A 402 has a data dependency distance of 1 
indicating that the assignment to x in one iteration is used by 
node A in the next iteration. Compare this with the arc 410 
between node B 404 and node C 406 that has a dependency 
distance of zero. Arc 410 indicates that the assignment to x 
in node B 404 is used by node C 406 in the same iteration. 
Thus, a data dependency graph represents which operations 
depend on other operations. The data dependency distance is 
used when determining an omega for an operation. 

FIGS. 5a, 5b, and 5c illustrate different ways a loop 
operation can be implemented. FIG. 5a illustrates the orga- 
nization of the instructions that effectuate a loop that has not 
been optimized by the invention. FIG. 56 illustrates the 
organization for loops having a compile-time determined 
number of iterations and optimized by the invention. FIG. 5c 
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illustrates the organization for loops having an execute-time 
determined number of iterations and optimized by the inven- 
tion. These figures are further described below. 

FIG. 5a illustrates the general organization of the iterative 
construct formed by the instructions that effectuate the loop 5 
operation described by a loop statement. When executing in 
a computer and as indicated by the arrow labeled 523, the 
loop operation starts when the flow of execution enters the 
prologue 501 of the iterative construct. The prologue 501 
contains instructions that initialize loop control variables 10 
and effectuate loop-invariant operations. For a SBBN loop, 
the instructions in the prologue are executed on entry into 
the loop. The iterative construct includes a kernel 503 that 
contains instructions that are repeatedly executed in 
sequence every iteration of the loop until the loop termina- is 
tion condition occurs. If the termination condition has not 
occurred, the kernel 503 repeats for another iteration. Thus, 
the computer again executes the instructions in the kernel 
503 as indicated by the arrow labeled 505. When the loop 
operation terminates, the flow of execution continues as 20 
indicated by the arrow labeled 507. 

FIG. 5b illustrates the general organization of the 
instructions, as optimized by the invention, that effectuate 
the loop operation described by a SBBN loop statement 
having a compile-time determined number of iterations (the 25 
tripcount). The optimized SBBN loop retains the same basic 
characteristics as shown in FIG. 5a, The prologue 507 of the 
optimized loop contains the same operations as did the 
prologue 501 of the loop in FIG. 5a. However, the prologue 
507 is expanded by the invention and now includes a number 30 
of peeled iterations 509 and a region that contains hoisted 
omega-invariant operations 511. Thus, the prologue 507 
again contains instructions that initialize loop control vari- 
ables and effectuate loop-invariant operations. After these 
instructions 507 execute, the flow of execution moves to the 35 
peels 509 (the iterations) unwound by the invention. Each of 
these peels 509 consist of the same operations as in the 
kernel 503 including any loop bookkeeping operations. In 
particular, peel #1 510 contains the instructions to effectuate 
the operations for the first iteration of the SBBN loop, peel 40 
#2 512 contains the instructions to effectuate operations for 
the second iteration and so forth through peel #n 514. 

When executed, peels #1 through #n 509 effectuate the 
first omega iterations for the iterative constructs operation. 
That is, these peels include omega-invariant operations that 45 
have not stabilized. The number of peels corresponds to the 
peelfactor determined during the optimization process.. After 
executing the peels 509, the flow of execution moves to the 
hoist region 511 where instructions that effectuate stabilized 
omega-invaliant operations are executed. The optimized 50 
kernel 513 no longer includes the omega-invariant operation 
as this operation has been hoisted to the hoist region 511. 
Because the optimized kernel 513 no longer includes the 
omega-invaliant instructions the optimized kernel 513 
executes faster then the kernel 503 described in FIG. 5a. 55 
When the loop operation terminates, the flow of execution 
continues as indicated by the arrow labeled 533. 

FIG. 5c illustrates the general organization of the 
instructions, as optimized by the invention, that effectuate 
the loop operation described by a SBBN loop statement 60 
having an execute-time determined number of iterations (the 
tripcount). Because the tripcount is determined at execution 
time, the iterative construct comprising the loop operation 
contains two versions of the loop operation — one version 
contains an optimized kernel 513 and the other contains a 65 
non-opfimized kernel 503. The compiler inserts additional 
operatioas 517 to select which version to use. These opera- 



tions are dependent upon the tripcount required for any 
particular invocation of the loop operation. When this opti- 
mized loop operation executes, the flow of execution enters 
the prologue 507* as indicated by the arrow marked 523, 
having the same characteristics as the prologue 507 shown 
in FIG. 5b. After initializing required loop control variables 
and executing loop-invariant instructions, the flow of execu- 
tion moves to the tripcount test 517. The instructions in the 
tripcount test 517 determine, by comparing the tripcount to 
the peelfactor, whether the optimized version of the SBBN 
loop operation is executed as indicated by the arrow marked 
521. If this path 521 is taken, execution occurs through the 
peels 509, the hoist region 511, and the optimized kernel 513 
in the same manner as was described for FIG. 5b. The flow 
of execution leaves the SBBN loop operation as indicated by 
the arrow labeled 525. 

If the tripcount test 517, determines that non-optimized 
loop operation is to execute, the flow of execution follows 
the path indicated by the arrow marked 519 and the loop 
operation occurs in the same fashion as was described for the 
loop shown in FIG. 5a. Those skilled in the art will recog- 
nize that many techniques exist to place the instructions to 
select which loop operation to use and that these instructions 
can be placed either before or after the prologue. 

FIGS. 6 and 7 illustrate how the invention is practiced in 
a compiler to generate the optimized loop operations having 
the structure shown in FIGS. 5b and 5c. 

FIG. 6, illustrates the general process 600 of an optimiz- 
ing compiler implementing an omega-invariant statement 
detection mechanism configured to detect omega-invariant 
operations in a SBBN loop. To perform this operation, the 
compiler detects omega-invariant operations within the loop 
operation. Once these operations are detected, an omega- 
invariant statement determination mechanism determines 
the omega for the statement. The flow of execution begins 
602 when this segment of the optimizing compiler receives 
incoming intermediate code representing a SBBN loop. 
When the compiler detects a SBBN loop 603 it constructs a 
data dependency graph from the intermediate code that 
represents that loop 604 and initializes the peelfactor for the 
loop to zero. Each node on the graph is initialized as variant 
with a stabilizer of zero 606. Then, using the DDG, all 
loop- invariant operations are marked invariant 608 so that 
they will be hoisted out of the loop's kernel and into the 
iterative constructs prologue. This is accomplished using 
prior art techniques such as those described in Aho. 

Next, each variant node is examined 610. If the examined 
node has only one arc, from a node in the loop marked 
invariant — the source node, the examined node is also 
marked invariant 612. The omega (the stabilizer value of the 
operation) for the examined node is set to the maximum of 
the existing omega for the examined node and the omega for 
the source node plus the dependency distance on the arc 
from the examined node to the source node. Finally, the 
peelfactor is set to the maximum of its current value and the 
omega for the current node. 

Now at step 614, if any node was marked invariant during 
the just completed pass over the variant nodes, the process 
repeats by again examining each variant node 610. If no new 
nodes were marked invariant during the just completed pass, 
the detection process completes through 616. 

FIGS. 7, la and lb illustrate how the invention optimizes 
the omega-invariant operations detected as shown in FIG. 6. 

FIG. 7 illustrates the general process 700 of an optimizing 
compiler implementing the invention for hoisting omega- 
invariant operations from the loop's kernel. This process, 
when performed by a computer, comprises a loop optimi- 
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zation mechanism. This mechanism adjusts the intermediate 
code representation of the loop operation, based on the 
omega values determined by the techniques described with 
FIG. 6, so as to optimize the loop operation. The compiler 
first determines whether the number of iterations for the loop 5 
is known at compile time. If so, then the optimizer creates 
an iterative construct of the form shown in FIG. 5b. If the 
number of iterations for the loop will only be known during 
execute-time, then the optimizer operates to create in itera- 
tive construct of the form shown in FIG. 5c. Hoisting of 30 
omega-invariant operations begins 702 upon receiving a 
DDG and the intermediate code that has been marked as 
explained in the description of FIG. 6. Next, the compiler 
determines whether the tripcount of the loop can be deter- 
mined during compilation or is to be determined during is 
execution 704. If the tripcount can be determined at com- 
pilation time the compiler processes 710 the loop as illus- 
trated by FIG. la. If the tripcount can only be determined at 
execution time, the compiler processes 712 the loop as 
illustrated by FIG . lb. In either case, the hoisting mechanism 20 
completes through the exit block 714 producing adjusted 
intermediate code that reflects the modifications made by the 
invention. The operations described by this adjusted inter- 
mediate code are eventually converted to binary opcodes 
specific to a target computer architecture with the structure 25 
as indicated in FIG. 5b or 5c. 

FIG. la illustrates the mechanism used by the compiler 
that optimizes loop operations by adjusting the intermediate 
code, when the loop's tripcount can be determined at com- 
pile time. The hoisting process begins 710 upon receiving a 30 
DDG and the intermediate code. Then 714, loop invariant 
operations are hoisted to the prologue. Next 716, the 
compiler, using the loop peeling mechanism, unwinds peel- 
factor iterations (peelfactor is the maximum omega for all 
the statements within the loop) from the kernel, leaving 35 
omega peeled iterations and a kernel having a stabilized 
omega-invariant operation. Then 718, the compiler hoists 
the stabilized omega-invariant operations to the hoist region 
from, the kernel, using prior art methods and techniques, 
leaving an optimized kernel. Finally, this process exits 40 
through the block labeled 719. One skilled in the art will 
recognize that it is within the scope of the invention to hoist 
operations that stabilize before peelfactor so as to optimize 
subsequent peeled iterations. 

FIG. lb illustrates how the compiler hoists operations 45 
when the loop's tripcount can not be determined at compile 
time. In this circumstance, the tripcount is determined 
during the flow of execution. When the iterative construct 
representing the loop operation is entered, the tripcount is 
evaluated, and either a non-optimized loop operation 50 
selected for further execution (if the tripcount is less than the 
number of peels) or, if the tripcount is large enough, an 
optimized loop operation is selected. 

The compiler's hoisting process begins 722 upon receiv- 
ing a DDG and the intermediate code. Then 724, loop 55 
invariant operations are hoisted to the prologue. Because the 
tripcount is determined at execution time, the iterative 
construct effectuates two versions of the loop operation -one 
version contains the prologue, peels, the hoisted omega- 
invariant operations and the optimized kernel (the optimized 60 
loop operation) and the other contains only the prologue and 
the non -optimized kernel. The compiler contains a loop 
selection insertion mechanism that inserts 726 additional 
operations to select which version of the loop operation to 
use. These selection operations are dependent upon the 65 
tripcount required by any given invocation of the loop 
operation. These selection operations are inserted in the 
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prologue and comprise a comparison of the tripcount with 
the peelfactor. If, at execution, the tripcount is greater then 
the peelfactor, the peels, the hoisted omega-invariant opera- 
tions and the kernel are executed. Otherwise, the loop 
operation containing the unpeeled kernel is executed. Those 
skilled in the art will recognize that the scope of the 
invention encompasses an embodiment that creates addi- 
tional optimized loop operations using differing pee If actors 
along with the additional program logic to select between 
them. 

Once the selection logic is inserted in the adjusted inter- 
mediate code representation, the SBBN loop operation, as 
defined by the intermediate code, is duplicated 728 by a loop 
duplication mechanism. One of the duplicates is then pro- 
cessed 730 by a loop peeling mechanism as explained in the 
description of FIG. la, above, to peel peelfactor iterations 
from the kernel, leaving omega peeled iterations of the 
loop's kernel and the kernel having a stabilized omega- 
invariant operation. Finally 732, the stabilized omega- 
invariant operations are hoisted, by a hoisting mechanism, to 
the hoist region from the kernel leaving an optimized kernel 
and the process exits 734. This process leaves an iterative 
construct in the form shown in FIG. 5c and explained above. 
Again, one skilled in the art will recognize that it is within 
the scope of the invention to hoist operations that stabilize 
before peelfactor so as to optimize subsequent peeled itera- 
tions. 

One skilled in the art will understand that the invention as 
described above teaches a computer system for optimizing 
programmed loop statements, an apparatus for optimizing 
programmed loop statements, a code optimizer for use with 
a compiler, a computer controlled method, and a computer 
program product embodied in a computer readable program 
code mechanism. 

Further, one skilled in the art will understand that various 
modifications and alterations may be made in the preferred 
embodiment disclosed herein without departing from the 
scope of the invention. Accordingly, the scope of the inven- 
tion is not to be limited to the particular invention embodi- 
ments discussed above, but should be defined only by the 
claims set forth below and equivalents thereof. 
What is claimed is: 

1. A computer system having a central processing unit 
(CPU) and a random access memory (RAM) coupled to said 
CPU, for use in compiling a target program for use with a 
target computer architecture, said computer system compris- 
ing: 

(a) a compiler system resident in said computer system 
having an optimization mechanism comprising: 

(b) a loop-invariant statement optimization mechanism 
configured to optimize a loop statement containing a 
loop-invariant statement, said loop statement repre- 
senting an iterative construct and said loop statement 
having the characteristics of a single basic block 
natural loop and including a determinable number of 
iterations; 

(c) an omega -invariant statement detection mechanism 
configured to detect a statement that is variant for at 
least the first iteration of said loop statement but 
which becomes invariant before the completion of 
the execution of all iterations of said loop statement 
within said loop statement; 

(d) said omega-invariant statement determination 
mechanism being further configured to determine an 
omega such that after omega iterations said omega- 
invariant statement becomes invariant, said omega- 
invariant statement determination mechanism 
including 
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i. a loop detection mechanism configured to dupli- 
cate said iterative construct leaving said iterative 
construct and a duplicated iterative construct hav- 
ing a kernel; 

ii. a loop peeling mechanism configured to unwind 
omega iterations from said kernel leaving one or 
more peeled iterative and said kernel having a 
stabilized omega-invariant operation; 

iii. a hoisting mechanism configured to hoist said 
stabilized omega- invarialit operation out of said 
kernel; said peeled iterations and said hoisted 
omega-invariant operation and said kernel com- 
prising an optimized iterative construct; and 

(e) an omega-invariant statement optimization mecha- 
nism configured to optimize said iterative construct 
responsive to said omega. 

2. The computer system of claim 1 wherein said deter- 
minable number of iterations is determined at execute-time, 
said loop optimization mechanism further comprising: 

a loop selection insertion mechanism configured to insert 
an execute-time loop selection mechanism, said loop 
selection mechanism configured to select either said 
optimized iterative construct or said duplicated itera- 
tive construct responsive to said execute-time deter- 
mined number of iterations. 

3. An apparatus for optimizing execution time for execut- 
able instructions in a target program that is designated to run 
on a target computer architecture, said apparatus compris- 
ing: 

(a) a computer having a processor, a memory and an 
input/output section; 

(b) a compiler system resident in said computer memory 
having an optimization mechanism comprising: 

(c) a loop-invariant statement optimization mechanism 
configured to optimize a loop statement containing a 
loop -invariant statement, said loop statement repre- 
senting an iterative construct and said loop statement 
having the characteristics of a single basic block 
natural loop and including a determinable number of 
iterations; 

(d) an omega-invariant statement detection mechanism 
configured to detect a statement that is variant for at 
least the first iteration of said loop statement but 
which becomes invariant before the completion of 
the execution of all iterations of said loop statement, 
said omega-invariant statement determination 
mechanism including 

i. a loop detection mechanism configured to dupli- 
cate said iterative construct leaving said iterative 
construct and a duplicated iterative construct hav- 
ing a kernel; 

ii. a loop peeling mechanism configured to unwind 
omega iterations from said kernel leaving one or 
more peeled iterative and said kernel having a 
stabilized omega-invariant operation; 

iii. a hoisting mechanism configured to hoist said 
stabilized omega-invariant operation out of said 
kernel; said peeled iterations and said hoisted 
omega-invariant operation and said kernel com- 
prising an optimized iterative construct; and 

(e) said omega-invariant statement determination 
mechanism being further configured to determine an 
omega such that after omega iterations said omega- 
invariant statement becomes invariant; and 

(f) an omega-invariant statement optimization mecha- 
nism configured to optimize said iterative construct 
responsive to said omega. 
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4. The apparatus of claim 3 wherein said determinable 
number of iterations is determined at execute-time, said 
omega-invariant statement optimization mechanism further 
comprising: 

a loop selection insertion mechanism configured to insert 
an execute-time loop selection mechanism, said loop 
selection mechanism configured to select either said 
optimized iterative construct or said duplicated itera- 
tive construct responsive to said execute-time deter- 
mined number of iterations. 

5. A code optimizer for use in a compiler system for 
compiling a target program to run on a target computer 
architecture, said code optimizer comprising: 

(a) a first portion configured to accept, as input, an 
intermediate code representation representing an itera- 
tive construct, said iterative construct having the char- 
acteristics of a single basic block natural loop including 
a determinable number of iterations; 

(b) a second portion, coupled to said first portion, con- 
figured to optimize a loop statement containing a 
loop-invariant statement; 

(c) a third portion, coupled to said first portion, configured 
to detect an omega-invariant operation within said 
iterative construct said omega-invariant operation 
being variant for at least the first iteration of said 
iterative construct but which becomes invariant before 
the completion of said iterative construct and to deter- 
mine an omega such that after omega iterations said 
omega-invariant operation becomes invariant, said 
third portion including 

i. a loop detection mechanism configured to duplicate 
said iterative construct leaving said iterative con- 
struct and a duplicated iterative construct having a a 
kernel; 

ii. a loop peeling mechanism configured to unwind 
omega iterations from said kernel leaving one or 
more peeled iterative and said kernel having a sta- 
bilized omega-invariant operation; 

iii. a hoisting mechanism configured to hoist said 
stabilized omega -invariant operation out of said ker- 
nel; said peeled iterations and said hoisted omega- 
invariant operation and said kernel comprising an 
optimized iterative construct; 

(d) a fourth portion, coupled to said third portion and 
configured to optimize said iterative construct respon- 
sive to said omega; said second and fourth portions 
generating an adjusted intermediate code representa- 
tion; and 

(e) a fifth portion configured to output said adjusted 
intermediate code representation. 

6. The code optimizer of claim 5 wherein said determin- 
able number of iterations is determined at execute-time, and 
wherein said fourth portion further comprises: 

a sixth portion to insert an execute-time loop selection 
mechanism to select either said optimized iterative 
construct or said duplicated iterative construct; said 
loop selection mechanism responsive to said determin- 
able number of iterations, said sixth portion, said 
seventh portion, said eighth portion, and said ninth 
portion generating said adjusted intermediate code rep- 
resentation. 

7. A computer controlled method of optimizing a loop 
statement within a target program directed at a target com- 
puter architecture, said loop statement describing an itera- 
tive construct, said loop statement having the characteristics 
of a single basic block natural loop including a detenninable 
number of iterations, said method comprising the steps of: 
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(a) detecting said loop statement containing one or more 
body statements; 

(b) optimizing said loop statement with respect to a 
loop-invariant statement if such exists within said loop 
statement; 5 

(c) detecting that one of said body statements is an 
omega-invariant statement that is variant for at least the 
first iteration of said loop statement but which becomes 
invariant before the completion of the execution of all 
iterations of said loop statement; 30 

(d) determining an omega associated with said omega- 
invariant statement such that after omega iterations said 
omega-invariant statement becomes invariant, said step 
of determining including the sub-steps of: 

(i) duplicating said iterative construct to provide a said 
iterative construct and a duplicated iterative con- 
struct having a kernel; 

(ii) unwinding omega iterations from said kernel to 
provide one or more peeled iterative and said kernel 2Q 
having a stabilized omega-invariant operation; 

(iii) hoisting said stabilized omega-invariant operation 
out of said kernel such that said peeled iterations and 
said hoisted omega-invariant operation and said ker- 
nel comprising an optimized iterative construct; and 25 

(e) optimizing said iterative construct responsive to said 
omega, 

8. The method of claim 7 wherein said determinable 
number of iterations is determined at execute-time, and 
wherein step (e) further comprises: 3 q 

inserting an execute-time loop selection mechanism con- 
figured to select either said optimized iterative con- 
struct or said duplicated iterative construct, said loop 
selection mechanism responsive to said execute-time 
determined number of iterations. 35 

9. A computer program product comprising: 

(a) a computer usable medium having a computer read- 
able program code mechanism embodied therein to 
optimize a target program containing executable 
instructions directed toward a target computer 40 
architecture, said computer readable program code 
mechanisms in said computer program product com- 
prising: 

(b) computer readable code first optimization mecha- 
nisms to cause a computer to optimize a loop state- 45 
ment containing a loop-invariant statement, said loop 



18 

statement representing an iterative construct and said 
loop statement having the characteristics of a single 
basic block natural loop and including a determin- 
able number of iterations; 

(c) computer readable code detection mechanisms to 
cause said computer to detect an omega-invariant 
statement that is variant for at least the first iteration 
of said loop statement but which becomes invariant 
before the completion of the execution of all itera- 
tions of said loop statement within said loop 
statement, said code detection mechanisms being 
configured to perform the operations of: 

(i) duplicating said iterative construct to provide a 
said iterative construct and a duplicated iterative 
construct having a kernel; 

(ii) unwinding omega iterations from said kernel to 
provide one or more peeled iterative and said 
kernel having a stabilized omega-invariant opera- 
tion; 

(iii) hoisting said stabilized omega-invariant opera- 
tion out of said kernel such that said peeled 
iterations and said hoisted omega-invariant opera- 
tion and said kernel comprising an optimized 
iterative construct; 

(d) computer readable code determination mechanisms 
to cause said computer to determine an omega asso- 
ciated with said omega-invariant statement such that 
after omega iterations said omega-invariant state- 
ment becomes- invariant; 

(e) computer readable code omega-invariant statement 
optimization mechanisms to cause said computer to 
optimize said iterative construct responsive to said 
omega. 

10, The computer program product of claim 9 wherein 
said determinable number of iterations is determined at 
execute-time in which said computer readable code omega- 
invariant statement optimization mechanisms comprise; 
computer readable code insertion mechanisms to cause 
said computer to insert an execute-time loop selection 
mechanism, said loop selection mechanism configured 
to select either said optimized iterative construct or said 
duplicated iterative construct responsive to said 
execute-time determine number of iterations. 

* * * * * 
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SOFTWARE DEBUGGING METHOD AND 
APPARATUS 

RELATED APPLICATION DATA 

This application claims benefit of Provisional Patent 
Application Serial No. 60/092,796, filed on Jul. 14, 1998, 
the disclosure of which is incorporated herein by reference. 

FIELD OF THE INVENTION 

The invention relates to debugging of computer code. 
More specifically, the invention relates to a method and 
apparatus for debugging computer code that does not require 
modification of the target executable program that is to be 
debugged. 

DESCRIPTION OF THE RELATED ART 

Computer programming has become a very complex 
operation because of the increased sophistication of com- 
puters and the increased complexity of tasks to be carried out 
by computers. Further, computer hardware is now relied 
upon for mission critical tasks, such as air traffic control, 
communications, medical equipment, and the like. It follows 
that computer software, i.e. a computer program, is control- 
ling these tasks. Note that the phrase "computer program" is 
used herein to refer to a set of software instructions and/or 
routines for controlling computer hardware. The word 
"application" is used herein to refer to the practical use of 
the program, for example, inventory control, accounting or 
any other use. 

Of course, a higher level of software complexity results in 
a higher potential for programming errors, known as "bugs". 
For example, many programs consist of thousands or even 
millions of lines of code. The sheer volume of code creates 
great potential for bugs. Also, as a result of the size of most 
computer programs, the code often is developed by plural 
teams of programmers each working on a different portion 
or aspect of the program. The teams may accomplish similar 
tasks in a different manner or otherwise produce code that is 
incompatible with code developed by other teams. Due to 
the huge reliance on computer software in just about every 
aspect of society, it is important to find and eliminate as 
many bugs as possible in an efficient manner. Also, it is often 
desirable to insure the performance of a computer program. 
For example, the end user may want to be guaranteed that a 
particular variable is updated very quickly. 

To find errors in code and assure performance, a process 
called "debugging" has developed. One form of conven- 
tional debugging is called "interactive debugging". Interac- 
tive debugging includes stopping the program at desired 
break points, known as "breaks", for examination of pro- 
gram status. "Noninteractive debugging" does not stop the 
program. However it uses debugging statements which are 
inserted into the source code and the source code is then 
compiled into machine code including the debugging state- 
ments. Once all errors have been eliminated and the program 
is operating in a desirable manner, the debugging statements 
are removed from the source code and the program is 
recompiled and used for its intended task. 

Conventional debugging techniques, while sometimes 
effective, have several shortcomings. In particular, conven- 
tional noninteractive debugging techniques require access to 
the source code for insertion of the debugging statements. 
Often, the user license for a particular program does not 
grant access to the source code or the right to change the 
source code. Therefore, in most cases, the user must rely on 
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the developer of the software to provide all debugging. 
Often programming errors occur only in a particular appli- 
cation of a program and thus are not caught by the developer 
during initial debugging. Also, the developer may be 

5 unavailable or unwilling to provide subsequent debugging 
services. Further, even if access to the source code is not an 
obstacle, the program as run during noninteractive debug- 
ging has been altered to accommodate the debugging state- 
ments. Therefore, the program might not run in the same 

10 manner during debugging as it will during normal operation. 
The debugging statements may mask errors, create errors, 
slow operation, or otherwise affect the function or perfor- 
mance of the program. Finally, many programs behave 
differently or even refuse to run when stopped. Therefore, 

is conventional interactive debugging using break points is not 
reliable and is difficult to implement. 

SUMMARY OF THE INVENTION 

It is an object of the invention to overcome the limitations 
20 of conventional debugging techniques. 

It is another object of the invention to permit debugging 
of a computer program without stopping execution of the 
program. 

25 It is another object of the invention to permit noninter- 
active debugging of a computer program without accessing 
or altering the source code of the program. 

It is another object of the invention to conduct perfor- 
mance testing of a computer program under actual execut- 
30 able conditions of the program. 

It is another object of the invention to conduct require- 
ment verification under actual executable conditions of the 
program. 

It is another object of the invention to increase the speed 

35 and efficiency of software development. 

It is another object of the invention to facilitate software 
comprehension. 

The invention uses "dynamic linking" to use the existing 

40 target program executable for debugging and thus avoids the 
long edit-compile-link cycles inherent in the use of debug- 
ging statements in source code. By linking new user actions 
into the executable image, the target program is allowed to 
run as it would normally run thus allowing debugging of 

45 time sensitive programs without the need to stop execution 
thereof. Calls to the user actions are inserted into the 
memory image during loading of the target program. The 
calls can be inserted at any time during or after transfer of 
the executable code into memory. Accordingly, the term 

50 "loading" as used herein refers to any time during or after 
transfer of the target program into memory for the purpose 
of running the program. The invention can be used to locate 
and fix programming errors, for requirements verification, 
for performance evaluation, for software comprehension, or 

S5 for any other evaluation of software. Accordingly, the term 
"debugging" as used herein refers broadly to any type of 
diagnosis or evaluation of software. 

A first aspect of the invention is a method for debugging 
a computer program comprising the steps of developing a 

60 debugging subprogram having a user action for debugging a 
target program, loading the target program for execution, 
inserting a call to the debugging subprogram into a memory 
image of the target program during the loading step, and 
executing the target program. 

65 A second aspect of the invention is a computer readable 
medium having instructions for debugging a computer pro- 
gram recorded thereon. The medium a includes a first set of 
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instructions for loading the target program for execution, child process 22 has received the acknowledgment from 

and a second set of instructions for inserting a call to a parent process 20 of a successful attachment and continues 

debugging subprogram having user actions into a memory on to patch into parent process 40 a call to the dynamic user 

image of the target program during loading of the target actions runtime start routines, i.e. the debugging actions. In 

program. 5 particular, calls to user actions in user action libraries 12 are 

inserted in the memory image of target program 10 (which 

BRIEF DESCRIPTION OF THE DRAWING is running as will be seen below) and user action libraries 12 

are loaded into RAM in a separate area. In step F, child 

The invention is described through a preferred embodi- process 22 waits for a new request for patching service from 

ment and the attached drawing in which: ^ pare nt process 20. In step G, child process 22 has received 

FIG. 1 is a schematic diagram illustrating the primary a request from parent process 20 and executed the request, 

components of a preferred embodiment of the invention; Steps F and G are repeated until the task is terminated by 

FIG, 2 is a flow chart illustrating the operation of the P are nt process 20. 

dynamic action linker in accordance with the preferred Of course, parent process 20 runs simultaneously with 

embodiment; 15 steps C-G of child process 22. In step H, parent process 20 

FIG. 3 is a flow chart illustrating the steps performed to waits umiI child Process 22 has attached to parent process 20 

patch a call into the target program; usin S tne operating system debug facility. In step I, parent 

FIG. 4 is a flow chart illustrating the steps performed to proceSS 20 loads th * user P ro S ram ' ^ es the current 

patch code into the parent process by the child process in ^"^y image with target program 10 in a known manner, 

accordance with the preferred embodiment; and 20 chlld P ro <*f » ha * ™*>1 °f P™* P«J°« 20, the 

™- _ . „ , operating system loads target program 10 and holds parent 

FIG. 5 is a flow chart illustrating the steps performed to process 20 while it notifies child process 22. Child process 

patch code into the parent process by the child process when 22 patcheS) in step Ej in such a way that it ^ load and 

user actions are requested for entry and exit of a routine in bfanch dynamic uscr actions (0 targct progfam 10 . Q step j 

accordance with the preferred embodiment. ^ ^ patched targe( program 10 then continues to run in step 

DETAILED DESCRIPTION OF TOE ^ ^ e new dvnam ^ c user act i Qns fr° m user action 

PREFERRED EMBODIMENT libraries 12. After completing the patched user actions, 

target program 10 continues executing in step L. 

FIG. 1 illustrates the primary components of the preferred FIG. 3 illustrates Step E of FIG. 2, i.e., patching of 

embodiment. Target program 10 is the existing program in 30 dynamic user actions into target program 10 in detail. Child 

executable format that is to be debugged. Target program 10 process 22 created by dynamic action linker 18 must patch 

is a machine code executable file that has been compiled the memory image of target program 10 so that it will call 

from source code in a known manner using a known the newly loaded user action routines. In step El, child 

software compiler. User action libraries 12 are created by process 22 allocates space for the patch in the patch area in 

compiling one or more debugging subprograms 16 35 parent process 20. In step E2, child process 22 replaces an 

(containing user actions) into linkable libraries (similar to instruction (or instructions) at the requested program loca- 

Dynamic Link Libraries (DLL) in a Windows™ tion with a branch instruction to the patch area. In step E3, 

environment). This compiling procedure is accomplished by child process 22 generates code to call the user action, 

action compiler 14 which compiles the source code of FIG. 4 illustrates the function of the code patched into the 

subprograms 16, which are written in ANSI C for example, ^ parent pr0 cess 20 by child process 22. In step M, a subrou- 

into machine code of user action libraries 12. These files t i ne 0 f target pr0 gram 10 begins. In step AB, a subroutine 

together with target program 10 are input to dynamic action ca n p i ace d in the subroutine branches to patch area 42. In 

linker 18. Note that there can be multiple user action patc h area 42, code to do the following actions has been 

libraries 12, including a runtime library. written: allocate space on the patch stack (AC), save any 

Dynamic action linker 18 reads target program 10 and 45 registers to the patch stack (AD), call the dynamic user 

user action libraries 12 and creates two processes. A first action routine (AE) enter the dynamic user action (AF), 

process 20 created by dynamic action linker 18 consists of execute the dynamic user action and return to the patch area 

target program 10 and the debugging user actions needed (AG), restore any saved registers from the patch stack (AH), 

from user action libraries 12. A second process 22 created by return the space allocated from the patch stack (AI), execute 

dynamic action linker 18 handles requests from process 20, 50 the instruction or instructions that were removed from the 

to modify code locations in process 20 as described in detail subroutine entry sequence (AJ), and return to subroutine 

below. Action compiler 14 and dynamic action linker 18 can (AK). In step AL remaining code is executed and execution 

be recorded as instructions on a computer readable medium. is returned to the caller of the subroutine in step AM. 

FIG. 2 illustrates the operation of dynamic action linker FIG. 5 illustrates the function of parent process 20 and the 

18. In step A, dynamic action linker 18 forks a child process 55 child process 22 when user actions have been requested for 

to create a parent process which corresponds to process 20 the exit as well as the entry to a routine. In step BA, a 

disclosed above and a child process which corresponds to subroutine begins. In step BB, a call has been inserted in the 

process 22 above. The fork system code is a well known subroutine and thus it branches into patch area 42. In patch 

service supported by most operating systems in which a area 42, code to do the following actions has been written: 

copy known as a "child" is made of parent program in RAM. 60 allocate spaced on the patch stack (BC), save any registers 

Child process 22 allows attachments of debugging com- to the patch stack (BD), enter the user action (BF), call the 

mands. In step B, both parent process 20 and child process user action routine (BE), execute the user action and return 

22 identify themselves and the execution path splits. Child to the patch area (BG), replace the subroutine's return 

process 22 then attempts to attach to parent process 20 using location with dynamic exit action patch routine (BH), 

known available operating system services in step C. Child 65 restore any saved registers from the patch stack (BI), return 

process 22 waits for an acknowledgment from parent pro- the space allocated from the patch slack (BJ), execute the 

cess 20 that it has successfully attached in step D. In step E, instruction or instructions that were removed from the 
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subroutine entry sequence (BK), and return to subroutine 
(BL). In step BM, the rest of the subroutine is executed. In 
step BN execution returns to the dynamic exit action routing 
(BN). A user action also begins in step (BO). Space on the 
patch stack is allocated and any registers are saved to the 5 
patch stack in step BP. The dynamic user action is executed 
in step BQ. The saved registers are restored and the space 
allocated on the patch stack is returned in step BR. In step 
BQ execution returns to the caller of the subroutine in step 
BJ. 10 

The various components and functions described above 
can be in the form of computer software running on a 
computer such as a standard personal computer or a server. 
The software can be written with known programming 
languages and stored on known media. For example, mag- 15 
netic media, such as a removable diskette or a hard drive can 
be used. Also, the software can be stored on optical media, 
such as CDROM. Alternatively, any computer readable 
medium can be used to store the software of the preferred 
embodiment. The invention has been described through a 20 
preferred embodiment. However, various modifications can 
be made without departing from the scope of the invention 
and defined by the appended claims. 

The invention allows the exact executable that will be 
shipped to a customer to be tested/verified. The invention 25 
can be used for subprogram call tracing, logging subpro- 
gram calls and parameter values, performance 
measurements, memory usage tracking and detections of 
memory violations, fault injection, test coverage, component 
testing, program and data verification, and requirements 30 
verification. 

What is claimed is: 

1. A method for debugging a computer program compris- 
ing the steps of: 

developing a debugging subprogram having a user action 

for debugging a target program; 
loading the target program for execution; 
inserting a call to the debugging subprogram into a 

memory image of the target program during said load- 40 

ing step; and 

executing the target program with the call inserted therein. 

2. A method as recited in claim 1, wherein said developing 
step comprises the steps of: 

preparing source code of a debugging routine: and 45 
compiling the source code of the debugging routine into 
the debugging subprogram. 

3. A method as recited in claim 2, wherein said inserting 
step comprises the steps of: 5Q 

allocating patch space in the debugging subprogram; 
replacing instructions of the debugging subprogram with 

a branch to the patch space; and 
generating object code to call the user action. 

4. A method as recited in claim 1, wherein said step of 55 
executing the target program comprises the steps of: 
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executing the debugging subprogram when the call to the 
debugging subprogram is encountered in the target 
program; and 

returning to execution of the target program after said step 
of executing the debugging subprogram. 

5. A method as recited in claim 1, wherein said step of 
executing the target program comprises the steps of: 

forking a child process of the debugging subprogram; and 
loading the target program with the parent process of the 
debugging subprogram. 

6. A method as recited in claim 5, wherein said step of 
inserting comprises patching a call to a dynamic user action 
into the target program with the child process. 

7. A method as recited in claim 2, wherein said preparing 
step comprises inserting a user action into the source code to 
reference entities of the target program. 

8. A method as recited in claim 7, wherein said entities 
comprise at least one of variables, functions, expressions 
and objects declared in the target program. 

9. A computer readable medium having instructions for 
debugging a computer program recorded thereon, said 
medium comprising: 

a first set of instructions for inserting a call to a debugging 
subprogram containing a user action into a memory 
image of the target program during loading of the target 
program; and 

a second set of instructions for executing the target 
program with the call inserted therein. 

10. A medium as recited in claim 9, wherein said first set 
of instructions comprises: 

instructions for allocating patch space in the debugging 
subprogram; 

instructions for replacing instructions of the debugging 
subprogram with a branch to the patch space; and 

instructions for generating patch object code to call the 
user action. 

11. A medium as recited in claim 9, wherein said second 
set of instructions comprises: 

instructions for executing the debugging subprogram 
when the call to the debugging subprogram is encoun- 
tered in the target program; and 

instructions for returning to execution of the target pro- 
gram after executing the debugging subprogram. 

12. A medium as recited in claim 9, wherein said second 
set of instructions further comprises: 

instructions for forking child process of the debugging 

subprogram; and 
instructions for loading the target program with the parent 

process of the debugging subprogram. 

13. A medium as recited in claim 12, wherein said first set 
of instructions further comprises instructions for patching a 
call to a dynamic user action into the target program with the 
child process. 

* * * * * 
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[57] ABSTRACT 

A computer program to be compiled is optimized prior 
to carrying out the final compilation. Subgraphs within 
the program are identified and examined for optimiza- 
tion beginning with the entire program as the largest 
subgraph. The number of entities in each subgraph 
which are relevant to each dimension of arrays used to 
represent data flow equations is determined. Next, the 
amount of memory required to contain the arrays is 
determined. If that memory requirement is within a 
predefined memory usage limit for the compilation, 
then a specified procedure of the compilation process is 
applied. If the memory requirement to contain the ar- 
rays exceeds the predefined memory usage limit for the 
compilation, the process is repeated for successively 
smaller subgraphs within the program in an attempt to 
find a subgraph to which the memory limits allow appli- 
cation of the specified procedure. 
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This information is derived from the control flow- 

PARTITIONING OPTIMIZATIONS IN AN graph which is a directed graph depicting the possible 

OPTIMIZING COMPILER execution paths of the program. In a low order flow- 

graph, the nodes represent basic blocks of a program 

This invention relates to the improvement of the 5 and these are connected by directed edges representing 
optimizing process in an optimizing compiler. More paths along which control in the program flows. In a 
particularly, it relates to a method for preventing the high order flowgraph the nodes are comprised of basic 
optimization process from being abandoned because of blocks and/or strongly connected regions, 
lack of memory space when compiling large programs i n the present specification, the term "basic block" 
in optimizing compilers that use global optimization 10 means any set of instructions in a computer program, 
algorithms (herein called "specified procedures") to whether object or source code, which is a straight-line 
improve the quality of the code that they generate. sequence of code into which branches reach only its 

BACKGROUND OF THE INVENTION *" irsl mstru ction, and from which control leaves the 

^ . . basic block only after the last instruction. 

Optimizing compilers are important took for pro- 15 nt ^ « fitrong i y connected region" means a set of 
grammers to improve the effectiveness and efficiency of nodcs which ^ w h a ^ that ^ ^ rcpC at- 

the target CPU. The goal of an optimizing compiler is ^ followed by the program control without passing 
to generate the smallest and fastest set of object code ^rou gh a node outside the region. Strongly connected 
possible that exactly duplicates the function of the pro- rc ^ Qns m wclJ m ^ ^ of Ucr desj A 

gram as it was written^ In order to generate compact 20 , connected region" is a strongly 

and efficient object code for computer programs writ- ^Mri rcgion ^ has onJy one nodc that * bea- 
ten in high level languages, compilers used for such from ^ ^ sing)e cn try strongly connected region, 
languages must utilize sophisticated global optimizers fa ^ di £ Iosurc ; tne J£ .. fCgion » me ^ s a 

which generally use various specified procedures for c ^„ nUt r *™„ 

reducing the run time or the memory requirements of a 25 SlD |! C C " try S ^° n * Iy T 8 ^ rCgl0n ' K . # . f 
program For example a compiler may perform any or "subgraph" means any combination of 

all of: common subexpression elimination, code mo- nodes within the Aowgraplv All strongly connected 
tion, strength reduction (replacing a slow operation by rc « lon f m a,so "fbgrmphs, but not all subgraphs are 
an equivalent fast operation), store motion and remov- stro J* ly « 

ing useless code sequences. Descriptions of some of 30 . ^ x * m "«■«»" «f« * the components of an 
these optimizations can be found in: intermediate representation which are used to describe 

J. T. Schwartz, On Programming-An Interim Re- 8 P'°gram as it is being compiled. These include van- 
port on the SETL Language. Installment II: The able CTtnes « dictionary entries, results, expressions, in- 
SETL Language and Examples of its Use, Courant structions, and basic blocks of a program. 
Institute of Math Sciences, NYU, 1973, pp.35 The larger and more complex a program is, the larger 
293-310. more convoluted is its flowgraph, the greater the 

E. Morel and C. Renvoise, Global Optimization by number of calculations involved and the greater the 
Suppression of Partial Redundancies, CACM, Vol. number of expressions for which dataflow equations 
22, No. 2, pp, 96-103, 1979. must be solved. Memory requirements and processing 

A. Aho and J. UUman,' Principles of Compiler De- 40 timc for the compilation tend to increase quadratically 
sign, Addison -Wesley, 1977. 85 a function of source program size for global optimiza- 

Each of these optimizing specified procedures trans- tion. When a situation arises where the compiler cannot 
forms an intermediate language (IL) program into a optimize an entire program because of a space restric- 
semantically equivalent but more efficient IL program. tion » m the P 1 ** thc optimization has had to be aban- 
Intermediate level language, as its name implies, is be- 45 doned. Attempts have been made to improve the quality 
tween a high level source program and machine code in of optimizations in the past. A small number of patents 
complexity and sophistication. An intermediate level nas ***** granted on inventions in this area, 
language can be especially useful in preparing compilers U.S. Pat. No. 4,506,325 discloses a method of de- 
that are to be capable of translating any of several high creasing the storage requirements of a compiler by en- 
level languages into machine code targetted to any of 50 coding operators and operands using an information 
several machines; it reduces markedly the number of theoretic encoding technique, applied to segments of a 
products that must be developed to cover a wide range program. The disclosure does not deal with how a pro- 
of both machine types and programming languages, gram is segmented. 

because all may translate through a common intermedi- U.S. Pat. No. 4,57 1 ,678 discloses a method of utilizing 
ate level language. It is at the intermediate language 55 the limited number of registers in a target computer by 
level that most optimizations are commonly performed. ■ improving register allocation procedures. It does not 
The most important optimizations in an optimizing disclose any way of handling large programs that ex- 
compiler are carried out globally, that is, on a program- ceed the general memory availability of the target corn- 
wide level, rather than on a localized or basic block puter. 

level. In performing each of these optimizations, a series 60 There remains a need for a program compilation 
of data flow equations must be solved. In doing so, the technique that does not simply give up the struggle if an 
compiler gathers information about the expressions in optimization cannot be performed within the con- 
the program being compiled; such information is depen- straints of the hardware or computer on which the 
dent upon the flow of control in the program. For its program is being compiled. 

own unique code transformation, each optimization 65 It has now been discovered that the scope on which 
roust have a method of tracking when and how any an optimization is applied can be limited, and yet many 
given expression is available throughout the program as of the benefits of optimization can be realized. The 
compiled. program unit can be partitioned on the basis of its con- 
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trol flow structure into sections sufficiently small to be These quantities are constant for each subgraph of a 
manipulated by the compiler. program and the amount of memory needed can be 

_ „. „ easily calculated before the optimization is done. The 

BRIEF SUMMARY OF THE INVENTION m J 0Ty limh that u on thc ^p^, ^ ^ 

Accordingly, the present invention consists in a 5 pre-defined by the compiler writer at one or more levels 

method, in an optimizing compiler using specified pro- or it can be dynamically determined as the compiler 

cedures for identifying ways of improving the quality of assesses the system's resource usage. One reason to 

generated code, for optimizing a program to be com- constrain the amount of memory available to a selected 

piled comprising, prior to carrying out an actual optimi- procedure is also to constrain the execution time of the 

zation: 10 selected procedure. The memory requirements having 

(1) for a specified procedure, developing a control been determined for applying the specified procedure to 
flowgraph representing all possible execution paths the entire program, the memory requirement is com- 
for said program; pared with the memory available and if the available 

(2) identifying subgraphs in said program; memory is sufficient, then the optimization is carried 

(3) performing the steps of: 15 out. If the memory is insufficient* then an attempt is 

(a) selecting a subgraph to be examined for optimi- made to apply the specified procedure to each of the 
zation, said subgraph on a first iteration being the maximal individual subgraphs within the subgraph for 
entire program; which there was insufficient space. For each contained 

(b) by examining the code sequences in said sub- subgraph for which there is still insufficient memory 
graph, determining the number of entities in said 20 available, its contained subgraphs arc optimized; if these 
subgraph which are relevant to each dimension contained subgraphs cannot be optimized, then each of 
of arrays used in said specified procedure to the contained subgraphs will be further broken down in 
represent data flow equations; turn into its contained subgraphs. This iteration contin- 

(c) determining the amount of memory required to ues until a subgraph is examined whose optimization 
contain said arrays; 25 memory requirements are sufficiently small that an opti- 

(d) if said amount of memory exceeds a predeter- mization can be carried out within the allotted memory, 
mined memory usage limit for said compilation and the optimization is then performed. The method of 
thereby denoting an unsuccessful attempt at opti- the invention can be used with one or more optimiza- 
mizing the code in nodes of said subgraph, apply- tions, and when multiple optimizations are performed, 
ing step (e) for said subgraph, otherwise carrying 30 they may be performed in any sequence. 

out said specified procedure on said subgraph; An example of a specific optimization is the optimiza- 

and tion technique known as common sub-expression elimi- 

(e) repeating sleps (b) to (d) for every subgraph nation and code motion. For this optimization, the sub- 
contained within said subgraph for which insuffi- graphs into which the flowgraph is broken down are 
cient storage was found in step (d). 35 the strongly connected regions. One of the sets of data 

to be computed is the set of expressions that are avail- 
BRIEF DESCRIPTION OF THE DRAWING abIe at thc exit of ^ basic block ^ rcgion y^fon the 

FIG. 1 is a depiction of a typical flowgraph of a outermost region being optimized (the "avail" set). In 
portion of a program, as would be built by a compiler this case, two types of entities must be counted to deter- 
according to the invention prior to each optimizing step 40 mine the size of the "avail" set: the number of different 
that is to be performed. expressions, and the number of basic blocks and regions 

FIG. 2 is a flow chart depicting the operation of the including the region being comraoned. 
selection of appropriate subgraphs of a program for The availability information is collected by going 
optimization according to this invention. through the code in the basic blocks of the outermost 

- « region being optimized (i.e. the low order control flow- 

DESCRIPTION OF THE INVENTION graph) ^ marking which CX p rc ssions are computed 

The first step in implementing the method of the and not killed by any subsequent redefinitions of their 
invention is to construct a flowgraph of the program to arguments. These data are then propagated outward 
be optimized. Flowgraph construction is well-known in through the outer level regions to establish which ex- 
the art and for the purposes of the invention, can be 50 pressions are available at the exits of every region. After 
implemented by standard techniques. Then all of the this step, the assumption is made that nothing is avail- 
appropriate subgraphs are identified within the flow- able upon entry to the outermost region and this new 
graph. The choice of subgraphs depends on the optimi- information is propagated back into the inner regions, 
zation procedure to be performed. Strongly connected The need to process data flow information into and out 
regions are often useful, but intervals, hammocks and 55 of the regions of the flowgraph comes from the nature 
trees can also be used, and in extremely contained situa- of the loops which comprise the regions. Clearly, com- 
tions, a subgraph can consist of a single node. In the mon sub-expression elimination benefits from partition- 
description of a preferred embodiment of the present ing for large and complicated programs. The processing 
invention the subgraphs are taken to be the collection of time for propagating information into and out of the 
strongly connected regions. 60 basic blocks and regions of an entire program can be 

The number and dimensions of arrays to be formed very large, and further, the space requirements of the 

during the optimization must be known in order to arrays can be large. By partitioning the program ac- 

write the specified procedure for the particular optimi- cording to the method of the invention, not only the 

zation. Their dimensions will be based upon the quanti- second dimension of the "avail" set, but also its first 

ties of certain entities, for example instructions, results, 65 dimension is affected. In most cases, the universe repre- 

expressions, dictionary entries, and basic blocks in the senting the set of expressions which appear in an inner 

code sequence. The particular entities that are produced region is a subset of the universe for the set of expres- 

depend upon the specified procedure being undertaken. sions found in its outer regions. Thus, by optimizing on 
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the next level of regions in a high order flowgraph, the except that successively higher-level regions are exam- 
size of an array can be significantly reduced because it ined in turn. 

is highly probable that more than one dimension is af- The flow chart of FIG. 2 depicts each of the cases 

fected. described above. The optimization method begins at 21 

DETAILED nFsraiPTiON OF thp 5 wherein a flowgraph of the largest strongly connected 

DETAILED DESCWPnON OF THE region ^ dcveIopcd At 22 ^ Jargcst strongly ^ 

i w v tMi iuin nected region within this flowgraph (beginning with the 

A preferred embodiment of the invention will now be entire program as the first defined strongly connected 
described with reference to FIG. 1, in which a flow- region, as described above) is set to the first SCR. At 23 
graph representing a portion of a typical compiled pro* 10 the amount of computer system memory for performing 
gram is shown. In the embodiment illustrated here, the the specified procedure of optimization to the first SCR 
type of subgraph used is the strongly connected region. 15 determined. At 24 a test is made to determine if the 
Nodes 1 through 15 represent basic blocks of one or required memory for performing the specified proce- 
more lines of code having one entry and one exit point. dure of optimization to the first SCR is less than the 
The entire flowgraph shown in FIG. 1 is a strongly 13 predetermined allowable limit of memory for this pro- 
connected region 20; this is defined as being true, even cedure. If not, at 25 the next largest SCR is renamed to 
though it does not fit the classical definition because no ^e first SCR and the operation loops back to 23 to 
return path exists from within the flowgraph to node 1. determine if the memory required for performing the 
Searching for the next inner strongly connected re- specified procedure to this next largest SCR is less than 
gion can be done by methods known in the art, and is 20 4111011111 of available memory for this procedure, 
not critical to the method of the invention. In the pres- Whcn m SCR te found «° the amount of required 
ent example node 2 has a path returning from node 12, memory for performing the specified procedure is less 
but no return path from any later node. The next inner than thal avaiIa We. « 26 the specified procedure of 
strongly connected region in the flowgraph is region 19, „ optimization is performed on that SCR. which had been 
including nodes 2 through 12, The next contained 25 na med the first SCR. After performance of the specified 
strongly connected region within region 19 is analyzed %2?* m » ; ***** at 27 c ^ ct t cnmne [ f the first 
similarly. Thus a path can be traced including nodes 3 SCR 15 nc5tcd ^ a P mnt S( T K not ' thc <**™- 
through 11 repeatedly, and that defines region 18. Sum- Mt, °" P"*** «P« 10 * e performance of this 
larly, the other two strongly connected regions in the w f^f? f 1 ?*™* If f/ * 28 ^PUtM SCR is 

L f tal5£3^ ;r he spccified proccdurc on * is SCR ^ S avai| - 

STSum ITZftl'T'"* ''T^r/^iH Anothcr cxam P lc of the mode of implementing 
™?t^n^ r Sf ? ,p 1 0 -!? d ? W ? 35 the invcntion » * ven ^ the Pseudocode fragment in 
^ J^l C ° ntr01 WUh ° Ut pUmg t0 * A PP cndix l ' Thc%ode is adapW from an actual pro- 

^1^ C JT ?: t ^ u * ram ' which OP™ 1 * 8 fol,ows: Thc optimizer gets as 

™r k r > l C \7 T n c>n,cd fc 01 ; on the input the memory limits which will constrain the opti- 
program by, first, building the flow^aph shown in It resct5 ^ maps which arc t0 ^ create d t0 

FIG. 1. Then the entire program is tested for an opt.mi- ^ describc the unjversc for thc ^ of enti(ies t0 be used to 
zation procedure. As the compiler works through the the datanow equations for the optimization speci- 

entire program, a count of the entities used by the opti- fied proccdurCt which hcrc ^ commoning and code 
mizing procedure is kept, including the number of enti- motion> It maps md thc basic blocks which m 

ties which determine thc size of the arrays which repre- contained in the given region, and the number of dictio- 
sent the data flow equations As the counts are mere- 45 nary emrics m6 mereni symbolic register uses found 
mented, the unit size of each dimension of each array is in thc gjvcn ^gicm. It computes the size of the arrays 
noted and the total number of elements in the array is used by the specified procedure. It compares this total 
calculated. with lnc a^ted memory, and if the memory require- 

This is then compared with the allotted memory. If me nt exceeds the allotment, then returns the number 
the requirements exceed the allotted memory, then the 50 which corresponds to the region whose nodes were 
next smaller strongly connected region is found as out- m0 st recently mapped. Else, it finds the regions at the 
lined above, and examined in the same manner. When a next contained level. If such a region is found then it 
region is found that is sufficiently small to enable a repeats the steps that were performed previously on the 
successful optimization to be done, the optimization is contained region. If the region contains no regions it- 
performed on that region. The control flowgraph may 55 self, then the optimization searches for the next region 
optionally be rebuilt prior to optimizing on other sped- at the same level. If there is no region at the same level, 
fied procedures. then it looks for regions at the next higher level. It 

In some cases, it may be desirable, after a subgraph repeats ascending through the flowgraph levels until a 
susceptible of being optimized has been treated, to at- region is found or else there are no more levels to as- 
tempt to optimize a larger subgraph. Since application 60 cend. If there are no more levels, it reports to the speci- 
of the selected procedure has reduced the size of the fied procedure that no regions to .be optimized are 
code in the subgraph to which it was applied, it may found in the program. Comments within the code also 
now be possible to treat the containing subgraph as a help to illustrate the method of the invention, 
unit. However, such reexamination increases compila- The invention may be used on any machine using an 
tion time, whereas the thrust of the invention is to limit 65 intermediate level language in the process of compila- 
the compilation time and space during optimization. If tion, including Reduced Instruction Set Computers 
carried out, such a step is performed in the same manner (known generically as RISC computers). It can also be 
as the previous examination of the code sequences, used with a compiler that is in turn used with many 
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different computer systems, by adjusting the memory available to contain them. The code which performs 
limits to be utiliied by the optimizer. By the same register allocation then requires a great deal of process- 
means, the user can obtain varied levels of optimization ing time and memory space in trying to accommodate 
of the same source program. Because the optimizing these register requirements and may result in having to 
occurs at the intermediate language level it can also be 5 put register values into temporary storage to be re- 
used with optimizing compilers dealing with any high trieved later when the values are used. With partitioned 
level language, including Fortran, PL/I, Cobol, C, optimizations in large programs, code movement of the 
SETL and Pascal. Through the use of the invention, at type just described is restricted to inner sub-regions, and 
least partial optimizations can be performed upon very thus a register's assignment is placed closer to its uses, 
large programs that would be impossible to optimize 10 Whilst this may not be the theoretically optimal choice 
because of memory constraints in the computer being if an expression which is moved to the beginning of an 
used to carry out the optimization process. A further inner loop is also a constant in the outer region or re- 
advantage of the invention is the alleviation of register gjons, it frequently avoids the wasted processing of 
pressures which occur when optimizing a large pro- having to reverse the effects of a previous optimization 
gram. For example, when many expressions are moved 15 which results in poorer code quality, 
outside of loops, they exceed the number of registers 

APPENDIX I 

commoninp^map: proc(scrtf) rclums(intcgcr); 



•/ 

/• Procedure Name: commoning_map */ . 



• 



Function: Build maps of the basic blocks and the used items */ 
'* items in the dictionary and the rcg table for a •/ 

* specific region. */ 

V 

/• Inputs: scr? - indicates the node in the flow graph where ♦/ 
. * we start looking for a region for which t *' 

;• build the regjnap and dicjnap. If start = •/ 

nodc_count then we'll try building the maps V 

♦ for the whole program. Otherwise, it means */ 
.*• we've already processed this region and wc *■/ 

* want to find others in the program starling ♦/ 
•'♦ from where wc left off. */ 

•/ 

• Outputs: dicjnap, rcg_map, mapjnfo, nodejnap. */ 



del 5cr^ integer vaJuc; 
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del px-:com bit; 
cc! (i.j.ki integer; 
cci node integer; 

dc! limit integer; ! limit for vector sizes for partitioned commoning. 

del small Jimit litcrally(2048); !2k 

del medium Jimit litcrally( 1024000); ! 1 meg 

del large Jimit literaI1y(2048000); ! 2 meg 

if indebug(TC_OFF) - * 0 then ! if partitioning is turned ofT 

partcom - false: ! then set the flag to false 

else 

do; ! Otherwise, set the flag to true and 

partcom = true; ! choose the correct data space limit, 

if indcbug(TC_S') - = 0 then ! NOTE: the default is the medium limit. 

limit * small_umit; 
else 

if indcbue('PC_L') - 0 then 

limit * lirccjimit; 
eise 

limit = medium Jimit; 
end: ! else 

! Allocate the arrays used to map the entities which will be collected 
I by commoning. 
if allocation(dic_map) = 0 then 
do: ! if dic_map isn't allocated then none of the maps arc allocated. 

allocate dic_map extcnts(dicJah_top); 

allocate rcg_map extcntsflast _reg): 

allocate unmap_reg e*t en tsflast_reg- first _reg + 1): 

allocate nndcjnap extents fnode_count); 
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allocate unmap_nodc extents (nodc_count); 
end; ! if al)ocation(dic_map) = 0 

! Find a region small enough to work with.., 
do while (scr±f - = 0); 

! Reset the maps for the region 

nexweg. next_dic = 0; 

$zero(dic_map); Szero(rcg_map); $zcro(unmap_rcg); 
Szero{node_map); Szero(unmapjiode); 

) The loop pre_header node maps into position 1 of unmap_bb and 
! the post exit node which is always basic block U) maps into 
f position 0 which is not explicitly done here since nodejnap is 
! zeroed above. 

unmar_nod?(0) = post_cxit; 

node_map(pre_hcadsr) = 1; 

urunap_nod?n) = prejicadcr; 

nodc_map(post_exit) = 0; 

curr_nodf = I: 

map_nodefscr=); 

! Loop thru the instructions in the nodes that have been mapped. 

do k = ! to cun_node; ! don't need to look at the po.M_cxit node 
! since it is a "fictional" node which contains 
! no code and is used to simplify optimization 
! algorithms. 

node « unmap_nodc(k); 
if scr(nodc) then iterate; 
else 

do i = cnch instruction in the basic block "node", 
do j = each operand in the cuncnt instruction T 
if j is a register and hasn't been mapped yet then... 
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ncxt_reg = ncxl_nrg + 1 ; 
rcg_map(j) = next _rcg; 
unmap_rcc( ne xt_rcg) 15 reg; 
end; ! if j is a register 
else 

if j is a dictionary cnirv and hasn't been mapped yet then... 
ncxt_dic = next_dic + 1; 
dic_map(j) = next_dic: 
end; ! if j is a dictionary entry 
end do j; 
end do i; 
end do k; 

! If partitioned commoning is turned off OR ELSE partitioned commoning 
! IS being performed and this region ISN'T loo big. then perform 
I commoning on this region, 
if ( ^paricom) |* ( " , too_bic) then 
return (scr#); 

! Otherwise, look for another rceion to map within this region, 
do i = each node in the region 
if scr(i) then leave; 

end do i; 

J If there arc no inner regions in scr#, find the next one 

! at the same or higher level. 

if i > number of nodes in the region "scr#~ 

sct# » find_nextj-egion(scT#); 
else ! examine the first inner region of "scr£" 

scr= = i; 
end; ! do while (scr= = 0) 
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! If we reached here, then there are no more regions which can be 

! optimized. 

rctum(O); 

map_nodes: proefsen?); 

! Recursive procedure to map all the basic block and scr nodes contained 
! in the region "scrW". 
del ?cra integer value; 
del i integer; 

! Map each node in the region *sct#". 
do i * each node in the region 

! If this node is itself a strongly connected region, then map its 

! nodes first. 

if scr(i) then 
map_nodes(i); 

else 

do; ! Not a strongly connected region (therefore, it's a basic block) 
! so go ahead an map it. 
curr_nodc - curr_nodc +1; 
node_map(i) = curr_node; 
unmap_node(cuiT_nodc) = i; 
end; ! else 
end do ii; 

! After mapping all of its nodes, add the "scr#" node to the 
! nodc_map, too. 
currjiodc « cun_node + I; 
node_map($cri:) = cun_nodc; 
unmap_node(curr_node) - sen?; 

end: ; map_nodcs 
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too_big: proc returns (bit); 

r 

! VICTOR VECTOR DIMENSIONS SIZE OF A SINGLE VECTOR ELEMENT 

i • 

! avail regs nexweg x nodejrount 2 bytes (half a word) 

! dend_rces ncxt_reg x nodecounl 2 byles 

! avail dics ncxt_dic x node_count 2 bytes 

I regJciJls next _reg x next_rcg I byte (one quarter of a word) 

I 

if ( 2 4 (ncxt_rcg * nodejrount • .5) + ! sizes for availjxgs and dead^regs 

! • (next_dic * nodejrount * .5) + ! size for avail jiics 

l ♦ (next_reg 4 next_reg ♦ .25) ! size for regjcills 
) > limit then ! If total amount of memory needed by these 
retum(true); ! vectors is greater than the limit then return 
else ! true because the region is too bit to be commoned. 

return(faJsc); ! Otherwise, return false, 
end; ! too_big 

fmd_next ^region: proc(rcgn) retums(intcger); 

; 

f This procedure take a region number and finds the next region in the 
! flow graph which is at the same or higher level as "rcgn". The node 
! number of that next region will be relumed. If there is no such 
! region, the value 0 is returned to indicate this. 
I 

del regn integer value; 

del cr integer; ! containing region 

del i integer: ! loop counter 

I We quit if we arc already at the outermost region level, 
do whilcfrcgn * nodc_count): 

! "cr" is the region which contains "rcgn" 

cr - contaknincjegionfrcgn); 



5,193,190 



18 



12/02/2003, EAST version: 1.4.1 



5,193,190 

19 20 

! Loop through the remaining siblings in the containing region. 

do i = regn+ 1 to last node in *cr" 
! If there is another region, return its number 
if scr(i) then retum(i); 
end do i; 

! At this point, wc have to ascend a level and search the siblings 
! of regn's containing region, 
regn - cr; 

end: ! do while (regn " = node_count); 

! No more recions to examine... 
retum(0V, 

end; ! find_nextj-ecion 



end commoninc_map; 



We claim: 

1. A method for operating a compiler program in a ' 
computer system, said computer system utilizing said 
compiler program to execute a specified procedure for 
improving an intermediate language version of a com- 
puter program under development, comprising the 
steps of: 

a) developing a control flowgraph representing a 
largest strongly connected region in said program 
under development; 

b) by examining code sequences in said strongly con- 
nected region, determining an amount of memory - 
of said computer system required to perform said 
specified procedure to said strongly connected 
region; 

c) comparing said amount of memory determined in 
step b) to a predetermined computer system mem- < 
ory usage limit to determine if said amount of mem- 
ory determined in step b) exceeds said predeter- 
mined computer system memory usage limit, 
thereby denoting an incapability of said computer 
system of applying said specified procedure to said ' 
strongly connected region; 

d) if said amount of memory determined in step b) 
does not exceed said predetermined limit, applying 



said specified procedure to said strongly connected 
region examined in step b); otherwise 
e) if said amount of memory determined in step b) 
exceeds said predetermined limit, executing steps 
b)-d) to a next largest previously unselected 
strongly connected region in said control flow- 
graph. 

2. A method as claimed in claim 1, in which said 
specified procedure is for performing code motion in 
said intermediate language version of said development 
computer program. 

3. A method as claimed in claim 1, in which said 
specified procedure is for performing store motion in 
said intermediate language version of said development 
computer program. 

4. A method as claimed in claim 1, in which said 
specified procedure is for removing redundant code 
sequences in said intermediate language version of said 
development computer program. 

5. A method as claimed in claim 1, in which said 
specified procedure is for removing useless code se- 
quences in said intermediate language version of said 
development computer program. 

6. A method as claimed in claim 1, in which said 
specified procedure is for simplifying expressions in said 
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intermediate language version of said development 
computer program. 

7. A method for operating a compiler program in a 
computer system, said computer system utilizing said 
compiler program to execute a specified procedure for 5 
improving an intermediate language version of a com- 
puter program under development, comprising the 
steps of: 

a) developing a control flowgraph representing a 
largest strongly connected region in said program 10 
under development, said largest strongly con- 
nected region being a parent strongly connected 
region and including one or more nested strongly 
connected regions; 

b) defining said largest strongly connected region as a 1 5 
first strongly connected region; 

c) by examining code sequences in said first strongly 
connected region, determining an amount of mem* 
ory of said computer system required to perform 
said specified procedure to said first strongly con- 20 
nected region; 

d) comparing said amount of memory determined in 
step c) to a predetermined computer system mem- 
ory usage limit to determine if said amount of mem- 



ory determined in step c) is less than said predeter- 
mined computer system memory usage limit, 
thereby denoting a capability of said computer 
system of applying said specified procedure to said 
first strongly connected region; 

e) if said amount of memory determined in step c) is 
less than said predetermined limit, applying said 
specified procedure to said first strongly connected 
region examined in step c) and determining if said 
first strongly connected region is nested within a 
first parent strongly connected region in said con- 
trol flowgraph, and if so; 

0 determining if said first strongly connected region 
is nested within a first parent strongly connected 
region in said control flowgraph, and if so; 

f) defining said first parent strongly connected region 
as said first strongly connected region and repeat- 
ing steps c)-e); and 

g) if said amount of memory determined in step c) is 
greater than said predetermined limit, defining a 
next largest previously unselected strongly con- 
nected region in said control flowgraph as said first 
strongly connected region and repeating steps 
c>-0. 
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instruction is a conditional one, there is no method of 

METHOD FOR OPTIMIZING COMPUTER CODE determining in advance whether r not the jump will be 

TO PROVIDE MORE EFFICIENT EXECUTION ON executed. 

COMPUTERS HAVING CACHE MEMORIES JUMP instructions cause additional problems for 

5 pipelined central processing units. In a conventional 
BACKGROUND OF THE INVENTION pipelined processor, a number of instructions are being 
The present invention relates to computer systems processed at any given time. Each time an instruction is 
and, more particularly, to an improved method for executed, a new instruction is introduced into the 
generating computer code which executes more effi- "pipe" If a jump is executed, the next instruction in the 
ciently on computers having cache memories and the 10 pipe » not likely to be the target of the jump. Hence, a 
liifp portion of the pipe must be emptied and reloaded start- 
Conventional computing systems include a central ing with the target instruction. Hence, computer code 
processing unit which executes instructions which are which tnmi^wja the use of JUMP instructions is highly 
stored in a memory. The cost of providing higher speed desirable. 

central processing units has decreased much faster than 15 Broadly, it is the object of the present invention to 

the cost of computer memory. This decrease in cost at provide an unproved computer code optimization 

the central processing level has resulted from such inno- method 

vations as reduced instruction set architectures and It is another object of the present invention to pro- 
instruction pipelining, Similar advances have not taken vide a code optimization method that places code 
place in memory circuitry. In addition, the sire of com- 20 blocks which are executed in sequence in memory loca- 
puter programs has increased as more complex tasks are tkms which are close to each other, 
set-up for computer implementation. This increased it is yet another object of the present invention to 
program size further aggravates the central processor provide a code optimization method that reduces the 
memory mismatch. number of JUMP instructions in the executed computer 

One solution to the mismatch between central pro- 25 code, 

cessing speed and the cost of computer memory having These and other objects of the present invention will 

comparable speed is the introduction of intermediate become apparent to those skilled in the art from the 

memory units. Small high speed cache memories are fo ji owill g detailed description of the invention and the 

placed between the central processing unit and the accompanying drawings, 

computer's main memory. The cache memory has a 30 ° ^ 

speed which is comparable to that of the central pro- BRIEF DESCRIPTION OF THE DRAWINGS 

cessing umtihence, instruction .fetches .from the cache fl chart for a basic block positioning 

do not introduce processing delays. The size of the * k 7, trt - invMlt5 «« 

cache is much less than that of the main memory; hence, mc i?? d ac f* rdm * to the present invention 

£cos"S^^ 35 ^■'""T^^JS.M 

than a main memory having the speed of the central mVCnU ° n ^ to 

P ^n7he U central processing unit requests an instruc ™- 3 ^ustrates the benefits obtained by splitting 

tion, the request is intercepted by the cache. If the in- pi ^!™ ur 5?\ ^ « . - , v • 

struction inquestion is already stored in the cache, it is 40 ^ FIO. 4(a) shows i the tradittonal ordemgof basic 

returned to the central processing unit If the instruction block *' FIG - ^ows the ordering after Procedure 

is not present, the cache loads a small block of memory Splitting has been applied. 

locations which includes the requested instructions ^ GS * fl > ** d flustnte the manner in which 

from the main memory. The cache then returns the procedures are ordered according to the present inven- 

requested instruction. The need to load the small block, 45 tion * 

referred to as a "line", delays the execution of the pro- SUMMARY OF THE INVENTION 
gram; hence, such cache loads must be limited to a small ... 
fraction of the instruction fetches if the cache architec- The present invention comprises a code optimization 
ture is to provide a substantial speed enhancement. This method for providing more efficiently executable corn- 
will be the case if the computer code is organized such 30 puter code for running on a computer system having 
that the execution sequence of the code is the same as cache memories and the like. The method uses statisti- 
the storage sequence. That is, if instruction A is to be cal information obtained by running the computer code 
executed after instruction B, instruction B should be with test data to determine a new ordering for the code 
stored immediately after Instruction A In this case, blocks. The new order places code blocks that are often 
each instruction in each line of code will be executed in 55 executed after one another close to one another in the 
the order stored and cache loads will only be needed at computer's memory. 

the end of each line of code. Furthermore, the cache In one embodiment of the method of the present 

loads at the end of lines could then be anticipated in invention a first computer program is converted to a 

advance and thus performed before the lines in question second computer program. The first computer program 

were actually requested. 60 comprising a set of basic blocks of computer code ar- 

The presence of JUMP instructions in the computer ranged in a first linear order, and the second computer 

code results in difficulties in providing this ideal code program comprises a second set of basic blocks in a 

organization. When the central processing unit encoun- second linear rder, each bask block of said second set 

ters a JUMP instructi n, the next instructi n will be at of basic blocks corresponding to a basic block of said 

a point in the code which may be very distant from the 63 first set f basic blocks. The method first determines the 

location at which the JUMP instruction is stored. frequency with which each basic block of said first set 

Hence, the cache may not contain the line which in- of basic blocks transfers control to each other basic 

eludes the next instruction. Furthermore, if the JUMP block f said first set of basic blocks when said first 
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computer program is executed with predetermined test U(lta for error) tbeo (handle error case) (remaining 

data. An indicator is provided for each said pair of basic code) 

blocks havmg a determined transfer frequency greater Whcn ^ ^ u emptied, ^ compiler typically 
than zero, said indicator having a first state inoocaUng ^ ^ code for ^ error test at the end of a basic 
that said pair of basic blocks has not been examined and 5 ^ & branch around the code to 

a second state mdicaung that said pair of basic blocks ^ dtuation( ln normal ^ there is 

has been examined. TTie indicators are miualized to the Q0 ^ ^ branch b ^ a rcsulti not 

first sute. The pair of basic blocks for which said mdica- ^ ^ m ^ cac h e ^ containing the 

tor is in said first state having the largest transfer fire- bnulch instruction, or the cache line containing the 
quency is then determined and said indicator set to said ™ u ^ ^ exccutc d. Incompletely using some cache 
second sute. A pair of chains, each comprising one or ^ m total number of cache lines used 

more basic blocks, is combined if the source basic block Wrf|er ^ would ^ expected from the number 

Tor the selected pair of chains is the tail of one of said of instructions actually executed, 
chains and the target block is the head of the other of ^ tf^on, on pipelined computers, the hardware 
said chains, said source and target basic comprising said oftcn predicts that forward conditional branches will 
pair of basic blocks. The combining steps are repeated ^ ^ ^ ^ toward conditional branches 
until all said indicators are in said second state. The ^ ^ For ^ co6c generated in this example, 
chains are then arranged in a linear order to form said the prediction f or the forward branch is usually incor- 
second computer program. rect This leads to wasted cycles resulting from the need 

In a second embodiment of the present invention, a to n]oa6 ^ pipeline. The deeper the pipeline, the 
computer program comprising a plurality of procedures larger the penalty. 

is modified to cause the computer program to be loaded jj^ p Tescnt invention identifies such cases and moves 
in a more efficient memory order by determining the ^ infrequently executed code away from the more 
frequency with which each said procedure calls each 25 frequently executed code so that the normal flow of 
other said procedure in said computer program and control is in a straight-line sequence. The target and 
placing the procedures in the computer's memory in an sense 0 f the conditional branch are changed to reflect 
determined by said determined frequencies. th c new ordering of the code. As a result, longer se- 

quences of code are executed before a branch is taken. 
DETAILED DESCRIPTION OF THE 30 Thus, on average, the number of instructions executed 

INVENTION per cache line increases, and there are fewer caches 

misses. Furthermore, branch penalties are reduced since 
The present invention utilizes two passes to optimize the hardware prediction matches normal code execu- 
the computer code. In the first pass the code is compiled tion. Finally, in some cases, moving the infrequently 
and special code inserted which collects statistics on the 33 executed code results in the elimination of an uncondi- 
subsequent execution of the code. This special code will tional JUMP instruction. 

be referred to as the instrumentation code in the follow- When the code is instrumented in the first pass, a 
ing discussion. The instrumented code is then executed small block of code is added to each basic block for 
using representative test data. During the execution, the each possible transfer to that basic block. This code will 
instrumentation code records data indicating the num- 40 ^ referred to as a basic block stub in the following 
ber of times each block transfer control to the other discussion. When the compiler encounters a JUMP 
blocks in which various blocks of code were actually instruction, it transfers to the target of the JUMP in- 
used. In the second pass, this data is used to reorder the struction and inserts a basic block stub. The basic block 
code such that instructions that are often executed after stub comprises storage space for a counter used to re- 
one another are stored close to one another in memory. 45 cord the number of times the jump is executed, data 
The reordering operation also eliminates a significant specifying the source of the JUMP instruction, and a 
number of JUMP instructions in the portion of the code JUMP instruction to the beginning of the basic block. A 
which is most frequently executed, thereby reducing basic block stub is inserted for each possible JUMP 
the program size and further improving the efficiency instruction having the basic block in question as its 
with which the code is executed. 50 target The counters are initialized to zero by the link- 

The smallest unit of computer code that is moved by age editor prior to the execution of the code with the 
the method of the present invention will be referred to test data. 

as a basic block. A basic block is defined to be a straight After the code has been execute with the test data, the 
tine section of code with a single entry point at the present invention examines the basic block stubs and 
beginning and a single exit point at the end. The basic 55 constructs a graph having arcs showing the various 
block was chosen as the smallest entity to move because jumps and the number of times each JUMP instruction 
once a basic block is entered, it is guaranteed that all of was executed. The code is then reordered, 
the instructions for that block will be executed in order. The reordering combines basic blocks into chains. A 
Hence, no further improvement in cache loading can be chain is defined to be a sequence of basic blocks that 
obtained by reordering the code within the basic block. 60 should be a contiguous section of straightline code. A 
Basic Block Positioning source basic block is defused to be a basic block ending 

Within most computer programs, there are control in a branch instruction such as a Jump. A target basic 
flow paths and basic blocks that are seldom executed block is defined to be the basic block which begins with 
during typical program runs. Conventional compilers the target of a branch instruction. It should be noted 
generate code in a manner in which these rarely used 65 that every basic block is a target of some other basic 
sections of code are interspersed with frequently used block, and every basic block is also a source basic block 
sections. For example, consider the code for an error in a computer program running under a standard oper- 
check in the form: ating system. 
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Three rules are used to construct chains. First, two 
chains can be merged if the source basic block for the 
selected arc is the tail of a chain and the target block for 
the selected arc is the head of a chain* Since all basic 
blocks are themselves chains, this rule will always be 5 
true for the first arc selected. 

Second, if the source of the arc is not a tail of a chain 
or the target of the arc is not a head of a chain, then the 
chains are not merged. However, such arcs are used to 
define a precedence relation among the chains. The 10 
precedence relationships are used to guide the final 
placement of the chains. When two chains cannot be 
merged, the chain containing the source is given prece- 
dence over the chain containing the target if the chain 
containing the source basic block consists of a condi- 15 
tional branch to a target basic block in the second chain. 

If the source basic block for the current arc contains 
a conditional branch, then the current arc contains one 
of the two possible targets. If the source is not the tail of 
a chain, then the source has already been chained with 20 
the second of the possible targets. Since arcs are exam- 
ined in the order of the frequency of use from most 
frequently executed to least frequently executed, the 
source basic block in question will have already been 
chained with the dominant target. On pipelined ma- 25 
chines, the lesser taken path of a conditional branch 
should be a forward branch in order to avoid penalties 
resulting from incorrect branch prediction. Thus, since 
the most common target basic block for the branch has 
already been chained to the branch, the target for the 30 
current arc should be a forward branch from the branch 
point, This is the rationale behind giving the chain con- 
taining the source node precedence over the chain con- 
taining the target node 

As will be discussed in detail below, the above men- 35 
tioned two rules do not always resolve the ordering of 
all of the basic blocks. If there are any chains for which 
a precedence relationship has not been specified by the 
second rule, a third rule is applied. The third rule estab- 
lishes the precedence of remaining chains by giving 40 
precedence to a chain currently lacking precedence if 
this would result in the second rule being satisfied for 
the arc having the largest count. 

FIG. 1 is a flow chart for the basic block positioning 
method as described above. The method examines each 45 
of the arcs in the graph. If any arc has not been exam- 
ined, the method finds the arc having the largest count 
among the unexamined arcs as shown at 10 and 12. The 
method then attempts to join the two chains connected 
by this arc as shown at 14. If the two chains can be 50 
linked then they are linked as shown at 16. The arc is 
then removed. If the two chains can not be linked, the 
arc is merely marked as being examined and the next 
unexamined arc is found. 

When all of the arcs have been examined, the pro- 55 
gram applies the second rule to establish precedence 
between any remaining chains as shown at 18. If there 
are any remaining arcs, the method then attempts to 
remove them by satisfying the second rule for the arc 
having the largest count. 60 

This reordering operation may be more easily under- 
stood by considering a simple example. FIG. 2 shows 
the basic block graph of a procedure with each arc 
labeled with its count as determined in a test run. The 
source code was a while loop written in the C language. 65 
Within the body of the loop were several if-then -else 
tests along with two alternate break statements at basic 
blocks G and M. The reordering starts by finding the 



6 

arc with the highest count which is the arc from basic 
block B to C (count =6964) and joining these two to- 
gether into a chain. The next largest arc (6950) from 
basic block C to D is then found. Since basic block C is 
at the end of a chain, the B-C chain can be extended to 
include D. The next three largest arcs (N-B, D-F, and 
E-N) also extend the existing chain. The next largest arc 
(D-E) can not be used to extend a chain, since basic 
block D is already in the middle of a chain. The process 
is repeated with the remainder of the arcs. The final 
result will be six chains, three of which being individual 
basic blocks: 



1) A 

2) E-N-B-C-D-F-H 

3) 1-J-L 

4) O-O 

5) K 

6) M 

After the chains have been formed, a precedence 
relation is attempted between them based on the desire 
to have non-taken conditional branches be forward. 
There are 6 conditional branches in this procedure. The 
second rule discussed above prescribes that chain 2 
containing B-C should occur before chain 4 containing 
0. If these two chains are placed in this order, then the 
less frequently executed path out of B is the forward 
branch as desired. 

The other conditional branches at C,F, I and J imply 
that the following order among the chains: 



chain 2 <C) before chain 4 (G) 

chain 2 (F) before chain 3 (I) 

chain 3 (I) before chain 6 (M) 

chain 3 (J) before chain 5 (K) 



There still remains one conditional branch left at D. 
Hence, the second rule will not always provide a com- 
plete set of precedence relationships. As noted above, it 
is preferred that the branch which is less frequently 
taken be forward. Hence, the branch from D to E 
should be forward (since that link is not taken as often 
as the path from D to F). But since E and D are part of 
the same chain, and E occurs before D in the chain, that 
branch is necessarily backwards. In other examples, it 
sometimes happens that the precedence relation be- 
tween chains is not transitive. In these cases the reorder- 
ing method resolves such conflicts by looking at the 
weights of the arc and choosing an order that will sat- 
isfy the precedence of the arcs having the higher 
weight. 

After the precedence criteria are determined between 
chains, the chains are arranged so that the order is satis- 
fled consistent with the above rules. If there is some 
freedom in choosing the next chain, the chain con- 
nected to existing chains by the heaviest count is 
chosen. In this example, the final order of the basic 
blocks is: 

A, E-N-B-C-D-F-H, I-J-U O-0, K, M 

An alternative ordering method which is less opti- 
mum then the one described above may also be used. In 
the alternative method, the chains are started with the 
entry point basic block. At each point, the computer 
tries to place a successor of the last placed basic block. 
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If there is more than one available, the one with the FIG. 3 shows an example of three procedures that 
higher arc count is chosen. In this example, the method would normally require two pages of memory. By split- 
starts with basic block A. It only has one successor, B, ting the primary portions away from the fluff, the pri- 
so that would be placed next. B has two successors, but mary code for the procedures can be contained within a 
C is connected with the higher arc count so it would be 3 single page. 

chosen next. Continuing in this way, blocks D, F, H and When separating the fluff basic blocks from the pri- 

N are placed. mary blocks for a given procedure, a new procedure is 

After N is chosen, its only successor has already been defined to encompass the fluff blocks. This procedure is 

selected, so a new starting place must be chosen from atypical in that it does not adhere to the standard proce- 

among the unselected basic blocks. In this example, E is M> durc calling conventions. There are no defined entry or 

selected since it is connected to the already chosen *»t «"* no register saves or restores. This 

blocks by a weight of 3117. Note that its only successor method was selected to avoid the overhead associated 

has also already been chosen; hence, another new start- ™* * stodard procedure caU whenever control trans- 

ing point is needed. The computer starts again with an *tween » Pnnwy and fluff procedure. The new 

unselected block, I. The final order after the algorithm " procedure ls also given a higher sort key so that the 

completes is A, B, C, D, F, H, N, E, I, J. L, 0, K, G. and linker will force all fluff procedures to the end of the 

u program space. 

Both of the above methods do a good job of making u . After ' 1 ?"* ^ /J™? ^ u 2?? ^ 
conditional branches into forward branches that are ,„ Z™*™"*^*^^ Tl^SS?^ 
usually not taken. However, the first method has the 20 ftonl 8 Pnmarybas.c block to a fluff to.c block or 

added "advantage of removing unconditional branches. ^I'^ " " ~ ma! £ 

1T - - f j • *u r j . * _ r branch is a JUMP instruction whose target may be 

Hence, the first method is the preferred embodiment of ^ ^ fl predetermined distancc from the JUMP, 

the present invention. ........ . Long branch instructions take longer to execute, and 

A uv uL neW ° f rdenng 1 of the b !S C K b,OC ^ h f 25 hence, are normally avoided. However, these instruc- 
estabhshed the frequently executed basic blocks (pn- ^ ^ ]ess of roblcm m thc t nation, since 
mary basic blocks) are now found toward the top of the ^ m Qn , ^ execui ^ ^ of ^ long 

procedure while the ^frequently executed blocks are brjmch ^ ^ the ori ^ nal ^ t ^ Wock A long 
found near the end. Basic blocks that were never exe- branch fa uscd eventually the fluff procedure 

cuted according to the coNected profiling data will be ^ wi]1 most likely ^ locatcd too far from it& primary 
referred to as fluff. The fluff ends up at the very end of countcrpart to ^ accessed by a short branch instruc- 
ts procedure. ^ 

Although this ordering and rearrangement of the These branches are allocated at the end of the proce- 
code is a significant improvement over that generated dure ^ ^ doing thc inicrprocedural branch. Placing 
by the compiler, there is still room for further improve- 35 ^ branch at the end of the procedure instead of con- 
ment In particular, the code would execute more eftV vcrting the orig inal branches into long branches was 
ciently if the fluff were moved to a distant location choscn for simplicity. If there are multiple branch sites 
thereby allowing all of the primary basic blocks to be which j^ve the same branch target, a single long branch 
grouped together. This separation process is referred to ^ shared to save space. 

as procedure splitting since it results in part of the code ^ since the paths between primary and fluff code were 
in a given procedure being relocated to a remote area of never x^ en according to the representative profiling 
memory. data, the long branches should be infrequently executed 

Procedure Splitting and only for atypical inputs. 

Procedure Splitting is the process of separating the An example of an if-then-else sequence where the 
fluff basic blocks of a procedure into a separate second- 45 then clause is considered to be fluff is shown in FIG. 
ary region of the main memory to minimi zc the size of 4(a)-{b). FIG. 4(a) shows the traditional ordering of 
the primary part of the procedures. By producing a basic blocks. FIG. 4(A) shows the ordering after Proce- 
smaller and denser primary procedure, more primary dure Splitting has been applied, 
parts of procedures can now be packed into a single The procedure splitting method of thc present inven- 
memory page. Hence a reduction in the working page 50 tion has been described with respect to basic blocks 
set size is often obtained. This is a significant advantage which were not utilized by the program while running 
of the present invention on computers which utilize the test data. However, it will be apparent to those 
virtual memory systems. skilled in the art that any basic block which had a fre- 

Virtual memory systems are analogous to the main quency of utilization less than some predetermined f re- 
memory -cache memory arrangement discussed above. 55 quency could be moved in the same manner. 
In this case, the slow memory usually consists of a disk Procedure Ordering 

drive, and the fast memory consists of the computers In the preferred embodiment of the present invention, 
main memory. Large programs are stored on the disk. yet another layer of code optimization is utilized. The 
As the instructions are needed, a portion of the pro- above described basic block reorganization is preferably 
gram, referred to as a page, is read from the disk into the 60 performed on basic blocks within a single procedure, 
main memory of the computer. A page, in turn, consists Most computer programs consist of a plurality of proce- 
of a plurality of lines which are read into the cache dures. Hence, in the preferred embodiment of the pres- 
memory prior to execution by the central processing ent invention, the reordering process is also performed 
system. By reducing the number of pages needed to at the procedure level. 

store the commonly used code, the number of times the 63 This separation is made because the compiler nor- 
desired page is not already in memory when requested mally has access to the basic blocks within a procedure 
is reduced. By moving the fluff, fewer pages are needed but can only be assumed to have the code for a single 
to hold the working sections of the program. procedure available at any given time. The linkage 
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editor, on the other hand, has access to the code gener- 
ated by the compiler for all of the procedures, but does 
not have the code for basic blocks in a form which is 
convenient for inserting the instrumentation code. 

In the preferred embodiment of the present invention, 5 
the procedure ordering is implemented in a separate set 
of passes. The first step in ordering the procedures is to 
construct a weighted call graph using the data collected 
by the instrumentation code. The linker introduces the 
procedure instrumentation code in a manner analogous 10 
to that described above for the basic blocks. The instru- 
mentation is setup by the linker because only the linker 
is assured of having access to all procedures at the same 
time. The basic instrumentation code is similar to that 
described above with reference to the basic block, and 15 
hence, will not be described in detail here. For the 
purposes of this discussion it is sufficient to note that a 
table is setup having a counter for each combination of 
calling and called procedure as is done for source and 
target basis blocks. Since the linkage editor already has 20 
a table giving the location of all calls and the procedure 
called, this is a straight forward extension of a conven- 
tional linkage editor. This table is initialized by the 
tinker prior to running the program with the sample test 
data. 25 

After running the program, a call graph is con- 
structed. Initially, each node of the graph is a single 
procedure and the edges correspond to calls between 
the procedures. The edges are weighted by the number 
of times the calls were actually made. If a procedure 30 
calls another from several different places within itself, 
or if two procedures were mutually recursive, those 
weights are first summed together to form the edge 
weight shown in the graph. 

Broadly, the method of the present invention places 35 
procedures in memory based on the frequency with 
which said procedures called each other when execut- 
ing the test data. The linkage placement of the proce- 
dures is determined by constructing groups similar to 
the chains described above with reference to basic 40 
blocks. However, the rules for ordering of the groups 
are somewhat different. Initially each group consists of 
one of the nodes in the call graph. The link order is 
constructed by first choosing the edge with the heaviest 
weight. The two nodes connected by this edge will be 45 
placed next to each other in the final link order to form 
a new group. However, the precedence of one node 
over the other in the group is not determined at this 
stage. This node will be referred to as an incompletely 
determined node. The two nodes are merged into a 50 
single node or group having two procedures therein, 
and the remaining edges leaving each node are co- 
alesced. 

This process is repeated until the two nodes being 
combined include at least one node that consists of an 55 
incompletely determined node. At this point, the nodes 
are combined, and the order of the various procedures 
in the node is determined. The order is determined by 
first making a list of the possible orders. Those entries in 
the list that place procedures next to one another that 60 
were not linked in the original call graph are then re- 
moved from the list If there is more than one possible 
order left, the one which places procedures closest to 
each other that had the highest count connecting them 
in the call graph is used. Once the order has been deter- 65 
mined, the linked procedures in the node are treated as 
a single "procedure" or an incompletely determined 
node in the remaining steps. The process is repeated 



until the resulting graph consists of an individual node 
or nodes with no edges. 

An example of this process is shown in FIG. S. FIG. 
5(a) shows the graph prior to any mergers. The edge 
with the highest weight is the one connecting proce- 
dures A and D. Hence, procedures A and D will be 
placed next to each other in the final link order. How- 
ever, the precedence of A and D will not be decided 
until later in the process. The two nodes are then 
merged and the process repeated. The next edge will be 
the one between procedures C and F. Again, it is deter- 
mined that C and F will be placed next to one another 
in the final link order; although it is not known which 
will come first After nodes C and F are merged, the 
graph will appear as shown in FIG. 5(6). 

The next highest weight is that between the two 
nodes that were just built by the previous mergers. 
Since it has already been decided that A and D should 
be together and that C and F should be together, there 
are four distinct choices available for ordering of this 
next merger: 

(1) A-D-C-F or F-C-D-A 

(2) A-D-F-C or C-F-D-A 

(3) D-A-C-F or F-C-A-D 

(4) D-A-F-C or C-F-A-D 

The second order in each case is merely the reverse of 
the first order. To determine which of these to choose, 
the original connections between the procedures are 
examined. First, the choices involving connections that 
did not exist in the original graph are eliminated. In 
particular, procedure F is connected neither to proce- 
dure A nor to procedure D in the original graph. 
Hence, possibilities (2) and (4) are eliminated. If there 
are any remaining choices, preference is given to the 
order that reflects the strength of the original ordering. 
For example, procedure C was more strongly con- 
nected to procedure A (weight of 4) in the original 
graph than to procedure D (weight of 3). Hence, it is 
preferred that C and A be adjacent in the final order, as 
long as the already determined groupings can be satis- 
fied. Hence, the grouping D-A-C-F or its equivalent 
grouping F-C-A-D is preferred. 

This process continues until the graph has no edges 
left. If the graph was a connected graph, then the final 
step will lead to a single node. If the graph was disjoint, 
then there will be several independent nodes. This later 
case can occur when a procedure is not called at all for 
the test data used to generate the graph. 

There has been described herein a method for opti- 
mizing computer code. Various modifications to the 
present invention will become apparent to those skilled 
in the art from the foregoing description and accompa- 
nying drawings. Accordingly, the present invention is 
to be limited solely by the scope of the following claims. 

What is claimed is: 

1. A method for operating a digital computer, includ- 
ing a memory for the storage of computer instructions 
and data, to convert a first computer program to a sec- 
ond computer program, said first computer program 
comprising a set of basic blocks of computer code ar- 
ranged in a first linear order and said second computer 
program comprising a second set of basic blocks in a 
second linear order, each basic block of said second set 
of basic blocks being derived from one of said basic 
blocks of said first set of basic blocks, said method com- 
prising the steps of: 

(a) determining the frequency with which each basic 
block of said first set of basic blocks transfers con- 
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trol to each other basic block of said first set of 
basic blocks when said first computer program is 
executed with test data; 

(b) providing an indicator for each pair of said basic 
blocks of said first set of basic blocks having a 5 
determined transfer frequency greater than zero, 
each said indicator having a first state and a second 
state, each said indicator being initialized to said 
first state; 

(c) determining which pair of basic blocks of said first 10 
set of basic blocks for which said indicator corre- 
sponding to said pair is in said first state has the 
largest transfer frequency and setting said indicator 
corresponding to said pair to said second state; 

(d) combining a pair of chains of said first set of basic 15 
blocks to form a new chain which replaces said pair 
of chains if a source basic block for said pair of 
chains is the tail of one of said chains and a target 
basic block is the head of the other of said chains, 
said source basic block and target basic block being 20 
said pair of basic blocks determined in step (c); 

(e) repeating steps (c) and (d) until all said indicators 
are in said second state; and 

(0 arranging said chains in a linear order to generate 
said second computer program. 25 

2. The method of claim 1 wherein said step of arrang- 
ing said chains in a linear order further comprises the 
step of placing a first one of said chains in front of a 
second one of said chains if said first chain contains the 
source of a conditional branch and said second chain 30 
contains the target of said conditional branch. 

3. The method of claim 2 wherein said digital com- 
puter further comprises means for loading said second 
computer program into said computer memory at mem- 
ory locations specified by labels in said second com- 35 
puter program and wherein said step of arranging said 
chains further comprises the step of placing the target 
basic block of each pair of chains combined in step (d) 
having a determined transfer frequency less than a pre- 
determined value into a code segment separate from 40 
that containing the source basic block of said each pair 
of chains combined in step (d), and labeling said sepa- 
rate code segment in a manner that will cause said sepa- 
rate code segment to be placed in a specified area of 
computer memory when said second computer pro- 45 
gram is loaded into said digital computer prior to execu- 
tion of said second computer program. 

4. The method of claim 3 further comprising the step 
of converting a branch instruction ending said source 
basic block of one of said chains to a long branch in- 50 
struction. 

5. In a digital computer system including a computer 
memory, a method for causing said digital computer 
system to load a computer program comprising a plural- 
ity of procedures, wherein at least some of said proce- 55 



dures when executed by said computer system call oth- 
ers of said procedures, into said computer memory, said 
method comprising the steps of: 

(a) determining the frequency with which each pro- 
cedure calls each other procedure in said computer 
program; and 

(b) placing said procedures in said computer's mem- 
ory in an order that depends on said determined 
frequencies, 

wherein said step of placing said procedures com- 
prises the steps of: 

(c) defining groups of said procedures, each group 
initially comprising one of said procedures, each 
group being defined as being determined or incom- 
pletely determined, said groups being initially de- 
fined to be incompletely determined; 

(d) defining a weight for each pair of said groups, said 
weight initially being the determined frequency 
with which said procedure in one group of said 
each pair called said procedure in the other group 
of said each pair; 

(e) combining a pair of said groups having the highest 
weight to form a new group which replaces each 
group in said pair of groups; 

(0 determining an order for the procedures in said 
new group if one of the groups in said combined 
pair was defined as being incompletely determined, 
said new group being defined as being determined 
if an order is determined for all procedures in said 
new group, and being defined as incompletely de- 
termined otherwise; 

(g) defining weight for each pair of groups containing 
said new group, is said weight being the sum of the 
frequencies with which said new group called the 
other group in said each pair of groups; and 

(h) repeating steps (e) through (g) until only one 
group remains, the order in which said procedures 
are loaded into said computer memory being deter- 
mined by the order of said procedures in said re- 
maining group. 

6. The method of claim 5 wherein said step of deter- 
mining an order for the procedures in a group which 
resulted from the combination of two groups in which 
one of said groups was defined as being incompletely 
determined and in which said combined groups includes 
at least three said procedures, said step of determining 
an order comprises the steps of: 
constructing a list of all possible orderings of proce- 
dures in said new group group; 
eliminating orderings in which a pair of procedures in 
which neither procedure calls the other are placed 
next to each other; and 
choosing one of said remaining orderings as the order 

for the procedures in said new group. 
• • * ♦ • 
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